sf elasticsearch meetup 2012.10.03

16
Scaling ElasticSearch SF Meetup 2012.10.03 Sushant Shankar sushant.shankar@33across .com

Upload: sushant-shankar

Post on 26-Jun-2015

604 views

Category:

Technology


3 download

DESCRIPTION

Some thoughts on scaling ElasticSearch, especially related to index building and optimizing for query performance.

TRANSCRIPT

Page 1: SF ElasticSearch Meetup 2012.10.03

Scaling ElasticSearch

SF Meetup2012.10.03

Sushant [email protected]

Page 2: SF ElasticSearch Meetup 2012.10.03

Agenda

• Why we need a search engine• Monitoring• Index Building• Query Performance

Page 3: SF ElasticSearch Meetup 2012.10.03

Who is asdfas

>600,000 PublishersMachine Learning and Graph algorithms to:- Build advertising segments- Extract insights out of social and interest data- Target via high-performance distributed systems that

integrate with our advertising partners

Website | Facebook | Twitter

Page 4: SF ElasticSearch Meetup 2012.10.03

Why we really need a search engine

… …

Batch! Good for complicated tasks (Machine Learning, Graph Algorithms, etc.)

Page 5: SF ElasticSearch Meetup 2012.10.03

INDEX BUILDING

1 WEEK → 3 HOURS

Page 6: SF ElasticSearch Meetup 2012.10.03

Mappers to build index

Build index using MR job and Bulk API

6 nodes, 24GB RAM16GB for ES service 4 cores3x 1.5TB drive

>1TB/index (replicated) ~300M documents~5KB / document~3 hours

Page 7: SF ElasticSearch Meetup 2012.10.03

Monitoring: Zabbix

Page 8: SF ElasticSearch Meetup 2012.10.03

Monitoring: SPM

Page 9: SF ElasticSearch Meetup 2012.10.03

Parameter OptimizationAmount bulk indexed

# Shards

Time takenCPU util.

Mem util.Disk I/ONetwork

Page 10: SF ElasticSearch Meetup 2012.10.03

Index Building: Learnings

• Bulk API• No replicas• 2 shards / CPU• 10,000 documents (users) per indexing

request• Refresh off (index.refresh_interval = -1)

Page 11: SF ElasticSearch Meetup 2012.10.03

QUERY PERFORMANCE

5 MINUTES 10 SECONDS

Page 12: SF ElasticSearch Meetup 2012.10.03

Query Performance: Learnings

• 1-2 Replicas (and for reliability)• Turn refresh on again (5s default)• Warm up effect (Index Warm up API 0.20+)• Optimize API• Simulate multiple users

Page 13: SF ElasticSearch Meetup 2012.10.03

Warm Up: load into memory and cache

Page 14: SF ElasticSearch Meetup 2012.10.03

Other cool features

• Custom Scoring functions• Scripts – MVEL, Python• Facets

• Exploring:• Real-time indexing• Indexing images, files, etc.• Parent-child relationships

Page 15: SF ElasticSearch Meetup 2012.10.03

QUERIES?

Page 16: SF ElasticSearch Meetup 2012.10.03

Index Building over time