elastic @deezer

24
Elastic @Deezer Aurélien Saint Requier, Search Data Scientist ELASTIC @DEEZER

Upload: aurelien-saint-requier

Post on 08-Apr-2017

122 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Elastic @Deezer

Elastic @DeezerAurélien Saint Requier, Search Data Scientist

ELASTIC @DEEZER

Page 2: Elastic @Deezer

/01

/02

/03

/04

Where?

Elasticsearch architecture

Querying Elasticsearch

ELK stack for analysis

Table of contents

ELASTIC #DEEZER

Page 3: Elastic @Deezer

Where?

/01

ELASTIC @DEEZER

Page 4: Elastic @Deezer

For search features

ELASTIC #DEEZER 4

Page 5: Elastic @Deezer

For chart and new release features

ELASTIC #DEEZER 5

Page 6: Elastic @Deezer

For recommendation features

ELASTIC #DEEZER 6

Page 7: Elastic @Deezer

Elasticsearch Architecture

/02

ELASTIC @DEEZER

Page 8: Elastic @Deezer

Elasticsearch architectureOur needs

ELASTIC #DEEZER 8

● Search and recommend

○ 3 millions of artists

○ 5 millions of albums

○ 50 millions of tracks

○ 2 millions of playlists

● Search and recommend content based on

○ metadata and other features

○ tag description

● New releases should become available in less than 2 hours

● Queries have to respond in less than 100ms

Page 9: Elastic @Deezer

Elasticsearch architectureOverview

ELASTIC #DEEZER 9

Page 10: Elastic @Deezer

Elasticsearch architectureData workflow

ELASTIC #DEEZER 10

Page 11: Elastic @Deezer

Elasticsearch architectureData workflow

ELASTIC #DEEZER 11

Page 12: Elastic @Deezer

How we deploy full indexes in production ?

ELASTIC #DEEZER

1. Get json data from Hadoop cluster (using WebHDFS)2. Index documents on mastersearch (using ES bulk api)3. Package the new index :

3.1. compress the ES index directory3.2. generate a deployment script

4. Copy the package on the temporary node of each cluster (using assassin, an homemade rsync deploy script)

5. Run deployment script : 5.1. Start a temporary ES instance and load the new index5.2. Set the required number of replica 5.3. Wait until data is replicated and then shutting down the

temporary ES instance5.4. Warm the new index5.5. Switch alias on the new index and close the old index

12

Page 13: Elastic @Deezer

Querying Elasticsearch

/03

ELASTIC @DEEZER

Page 14: Elastic @Deezer

How we analyze musical data?

ELASTIC #DEEZER 14

Use custom analyzers

Black Pearl (He's A Pirate) [feat. Sidney Housen] - EP

The Black Eyed Peas

● Lowercase asciifolding and char filters, music field synonyms :

● Edge_ngram tokenizer :

Page 15: Elastic @Deezer

How we search in our data?

ELASTIC #DEEZER 15

● Using a Java internal Elasticsearch plugin :

Page 16: Elastic @Deezer

How we search in our data?

ELASTIC #DEEZER 16

● Using Multi Search API and Query DSL:

Page 17: Elastic @Deezer

How we recommend our data?

ELASTIC #DEEZER 17

● Using function score queries :

Page 18: Elastic @Deezer

How we explore our data?

ELASTIC #DEEZER 18

● Using aggregation:

Page 19: Elastic @Deezer

Some feedbacks

ELASTIC #DEEZER

● In numbers: ○ More 25 millions queries a day, around 5000 queries / minute○ Around 95% queries respond in less 100ms

● In lessons :○ Be careful with fielddata usage○ Big jvm ES instance = Long gc time○ Avoid prefix queries : use edge-ngram tokenizer and do match

queries*

● In future : ○ Use a dedicated client/data/master architecture○ Stop fuzzy queries (replaced by a “Did you mean“ approach)*○ Migrate to Elasticsearch v2

19

*https://www.elastic.co/blog/elasticsearch-queries-or-term-queries-are-really-fast

Page 20: Elastic @Deezer

ELK for analysis

/04

ELASTIC @DEEZER

Page 21: Elastic @Deezer

Use of ELK

ELASTIC #DEEZER

● Elasticsearch v1.7.5 :○ cluster of 3 nodes○ index logs from Logstash and homemade scripts○ Around 2 billions of documents

● Logstash 1.5● Kibana v 4.1.1

○ 26 dashboards / 189 visualisations● Tools:

○ curator for index retention○ elasticdump for saving kibana settings

21

Page 22: Elastic @Deezer

Use casesMonitoring

ELASTIC #DEEZER 22

Page 23: Elastic @Deezer

Use casesAnalysis what our users search

ELASTIC #DEEZER 23

Page 24: Elastic @Deezer

Thanks for your attention

We are hiring !

jobs.deezer.com

Questions?