blablacar elastic search feedback

ElasticSearchfeedback

Introduction

Nicolas Blanc - BlaBlArchitect

SinfomicSinfomic (1999)

@thewhitegeek

(2001)

(2005)

(2008)

(2012)

What is BlaBlaCar ?

3 000 000MEMBERSIN EUROPE

10 9 countries10 9 countries

● France● Spain● Italy● UK● Poland● Portugal● Netherlands● Belgium● Luxemburg● NEW Germany

● France● Spain● Italy● UK● Poland● Portugal● Netherlands● Belgium● Luxemburg

Growth50 millions

25 millions

January

2008January

Infrastructure

2 front web servers 2 MySQL master (+4 slaves SSD) 1 private cloud

(KVM + Open vSwitch)● Redis● Memcache● RabbitMQ/workers

1 cluster ElasticSearch

Changing the Search Engine

What's existing ? Why Changing ?

MySQL Database● Relationnal DB (lots of join needed)● Plain SQL query● Home made geographical search

Recent problems● New feature, means more complex queries● Scalability : Performance depending on DB load

Initial requirements

Scalability● Trip search need to be made in less than 200ms● The system part of the solution easy to maintain● Be able to cluster it (also to not have SPOF)

Low code impact on existing application● Same features as of today (geographical search)● Minimize the developper's work ● Add one missing feature : facets

Initial Competitors

SenseiDB

Why ElasticSearch

✔ Easyest cluster possibility✔ Good performance when indexing✔ Few code to write to use it✔ Schema less✔ Based on Lucene✔ Written in Java (need to code grouping feature)

ElasticSearch has won,now migrate our search !

Changing our mindset

Object in Relationnal Database● Can be exploded on multiple tables● Lots of informations usable by JOIN

Object in Document Oriented Database● Only one big index for theses objects● All informations need to be in the object, not on multiple tables

Changing our mindset

Object in Relationnal Database● Can be exploded on multiple tables● Lots of informations usable by JOIN

Object in Document Oriented Database● Only one big index for theses objects● All informations need to be in the object, not on multiple tables

Well defining our objects

Need to know what we want to search● Searching trips (front office usage)● Searching members (backoffice usage)● Searching FAQ (front office usage)

Think of all needed field● The ones used for query● The ones used for filters● The ones used for facets

Thinking of well defining index

System point of view● Number of Nodes in the cluster● Number of Shards● Number of Replica

Application point of view● Define type and attributes for all fields (mapping)● Using parent/child or nested to improve indexing● How to push documents from DB ?

Indexing : using a river or not ?

River advantages● Plugs directly to our source backend● ElasticSearch API exists to code a new one

River problems● Not easy to add business logic on some fields● Really hard when your DB is unconventionnal● Full Reindex all the documents

Indexing : our manual way

We write an asynchronous indexer● Written in java● Have business logic when fetching from db● Fetch from multiple DB/source● Use of java ES library● Easy interface

●send {“trip”:1234567} and the server answer {“OK”}

One index sample : Trip

Well defining our object Trip

Think of all needed field● The ones used for query

● Trip date of departure,from where,to where,user id● The ones used for filters

● User ratings,price,vehicle,seats left,is user blocked(a blocked user, is a user who made some forbidden

action on the website.)● The ones used for facets

● User ratings,price,vehicle

Well defining our index Trip

Think of all system requirement● The cluster has 2 nodes

● We keep the default configuration for shards/replica

Think of object mapping● For each field :

● Define the type (string, long, geo_point, date, float, boolean)

● Define the scope (include_in_all)● Define the analyzer (for type string)

Trip Mapping

"trip": { "properties": { "is_user_blocked": { "type": "boolean", "include_in_all" : false }, "user_ratings" : { "type" : "long", "include_in_all" : false }, "from": { "type": "geo_point", "include_in_all" : false }, "price": { "include_in_all": false, "type": "float" },

"price_euro": { "type": "float", “include_in_all: false }, "seats_left": { "include_in_all": false, "type": "long" }, "seats_offered": { "include_in_all": false, "type": "long" }, "to": { "include_in_all": false, "type": "geo_point" },

"trip_date": { "format": "dateOptionalTime", "include_in_all": false, "type": "date" }, “vehicle”: { "include_in_all": false, "type": "string" }, "userid": { "include_in_all": false, "index": "not_analyzed", "type": "string" } }}

Well indexing eventsWhich modification send event change●All trips creation/deletion/modification●Member modifications (block or not)●New ratings from other members●A seat has been reserved●Member change his vehicle

Event change is a call to internal indexer●Send '{“trip”:123456}' to indexer (create/update)●Send '{“tripd”:123456}' to indexer (delete)

Sample trip index query{"query": { "filtered": { "query": { "match_all": {} }, "filter": { "and": [{ "geo_distance": { "distance": "40.14937866995km", "from": { "lat": 48.856614, "lon": 2.3522219 } } }, { "geo_distance": { "distance": "40.14937866995km", "to": { "lat": 45.764043, "lon": 4.835659 } } },

{ "range": { "price": { "from": 0, "include_lower": false } } }] } } }, "sort": [{ "trip_date": { "order": "asc" }, }], "filter": { "term": { "is_user_blocked": false } } }, "from": 0, "size": 10}

The Real WorldA trip has now more than 30 fields● (faq is around 25 fields)● (members even more...)

To build a trip document we need 3 differents SQL queries● (FAQ : 2 differents SQL queries)● (Member : 10 differents SQL queries)

A trip has only 1 shard (grouping)

And now the caveats

Preloaded Scripts

We use mvel script to improve scoring● They are not clustered● Each node need to have the scripts● Need a node restart to be added or modified

Solution : Chef (tool from Opscode) All nodes configurations are centralized into Chef repository

Grouping documents

Home made patchs to ElasticSearch(based on a Martijn Van Groningen work for lusini.de)

Soon in ElasticSearch(I hope so much)

Mapping modification

On a running index :Changing a type is not allowedChanging analyzer is not allowed

Solution : index alias1) Changing mapping → create a new index2) When new index is up to date → changing alias

IOs limits

We have only 2 nodes● Trip index is around 2GB● But only 1 shard for Trip index● Can index 100 trips / seconds on busy evening

Solution : We put Intel SSDs(waiting for distributed grouping feature)

Choosing the analyzer

Some field need to not be analyzed● If you use ISO code for country(IT, for Italy or DE for Germany are ignored in some cases)

Global analyzer has limits● Accentuation from countries like France, Germany or Spain are not always parsed correctly● One analyzer by country is difficult to implement in some cases

OK Sweet,What's next

Using ElasticSearch to ease log analysis

By the way…

We’re hiring !!! Dev, HTML Ninja, leader,…

Come & See me right now… or send me your friends

(And we have beer, baby foot and arcade cabinet )

Thank you !

Follow us !

@covoiturage

Apply now :

join@BlaBlaCar.com

blablacar elastic search feedback

cluster elasticsearch

join object

multiple tables

feature elasticsearch

relationnal database

mindset object

mysql database relationnal

search engine picture

Travel

blablacar et le covoiturage longue distance : analyse d'une...

nous rapprocher - blablacar

[hubday] blablacar - le service, nouvelle killer app des...

alec dent blablacar uk @blablacaruk · we create mobility...

design process at blablacar

strengthening leadership on low-carbon transport … ·...

blablacar and infrastructure automation

blablacar - going native - erwan robin

pr case study: profeina for blablacar (english version)

mobilités mutations - jour 2 - blablacar

growing the tech team at ridesharing startup blablacar by...

Юлия Крупенко / КЕЙС blablacar

blablacar: how we built a 25 million member strong community...

blablacar goes mobile! - parp - wspieramy e-biznes ·...

enn with - i t.a.k.e. (un)...

desire for mobility · blablacar delta airlines 129m ~160m...

elastic events - virtual conferences - elastic meetings

blablacar rides to greater returns across the globe … ›...

smart breakfast: laure wagner pour blablacar

"fun & serious : la culture chez blablacar" par laure wagner