sf elasticsearch meetup 2012.10.03
DESCRIPTION
Some thoughts on scaling ElasticSearch, especially related to index building and optimizing for query performance.TRANSCRIPT
Agenda
• Why we need a search engine• Monitoring• Index Building• Query Performance
Who is asdfas
>600,000 PublishersMachine Learning and Graph algorithms to:- Build advertising segments- Extract insights out of social and interest data- Target via high-performance distributed systems that
integrate with our advertising partners
Website | Facebook | Twitter
Why we really need a search engine
… …
Batch! Good for complicated tasks (Machine Learning, Graph Algorithms, etc.)
INDEX BUILDING
1 WEEK → 3 HOURS
Mappers to build index
Build index using MR job and Bulk API
6 nodes, 24GB RAM16GB for ES service 4 cores3x 1.5TB drive
>1TB/index (replicated) ~300M documents~5KB / document~3 hours
Monitoring: Zabbix
Monitoring: SPM
Parameter OptimizationAmount bulk indexed
# Shards
Time takenCPU util.
Mem util.Disk I/ONetwork
Index Building: Learnings
• Bulk API• No replicas• 2 shards / CPU• 10,000 documents (users) per indexing
request• Refresh off (index.refresh_interval = -1)
QUERY PERFORMANCE
5 MINUTES 10 SECONDS
Query Performance: Learnings
• 1-2 Replicas (and for reliability)• Turn refresh on again (5s default)• Warm up effect (Index Warm up API 0.20+)• Optimize API• Simulate multiple users
Warm Up: load into memory and cache
Other cool features
• Custom Scoring functions• Scripts – MVEL, Python• Facets
• Exploring:• Real-time indexing• Indexing images, files, etc.• Parent-child relationships
QUERIES?
Index Building over time