elasticsearch in production new york meetup at twitter october 2014
Post on 20-Jun-2015
73 Views
Preview:
DESCRIPTION
TRANSCRIPT
Elasticsearch in production !
Konrad Beiske konrad@found.no
@beiske
Who?
Senior software engineer of Found AS Working with Elasticsearch for 2 years
Herding hundreds of Elasticsearch clusters
Agenda
Agenda• Anti-patterns
• Memory / Resource Usage
• Distributed problems
• Security
• Client concerns
• Changing a cluster
found.no/foundation
Snapshot / Restore
Circuit breakersDocument values
Aggregations
Distributed percolation
Suggesters
…
Snapshot / Restore
Circuit breakersDocument values
Aggregations
Distributed percolation
Suggesters
…
Anti-Patterns
Arbitrary Keys
• “Schema Free”
• One field per value
• Ever-growing cluster state
acls: 1234: READ 42: WRITE
Heavy Updating
• Update = Delete + Reindex
• Be careful with counters
Slow queries
• WHERE foo ILIKE ‘%bar%’
• {“query_string”: {“query”: “foo:*bar*”}}
Arbitrary searches
query: filtered: filter: term: user_id: 42 query: [user’s query here]
Time Bomb
Memory
Memory• Field caches
• Filter caches
• Page caches
• Aggregations
• Index building
Page Cache
• Keeping index pages in memory
• Can’t have too much
• Outgrow: Gradual slowdown
Heap Space
• Memory used by Elasticsearch process
• Field / Filter caches
• Aggregations
Time Bomb
Time Bomb
OutOfMemoryError
Woah there
I ate all the memories
Your cluster may or may not work any more
OutOfMemory
• Growing too big
• Selecting too big timespan in Kibana
• Document ingestion peak
Preventing OOMs• Have enough memory :-)
• Understand your search’s memory profile
• Bulk / Circuit breaker settings
• Monitoring
• Document values
Marvel( /_stats )
Document Values
"my_field": { "type": "string", "fielddata": { "format": "doc_values" } }
Sizing
Sizing
• Test, don’t guess
• Start big, scale down
• Index, search, monitor
Glitch Meltdown
Glitch Meltdown
Glitch Meltdown
Glitch Meltdown
Glitch Meltdown
• Tie-breaker can be a cheap master-node
• Applies to data centers / availability zones too
Data-only nodes
Master-only nodes
Jepsen
Jepsen
• Kyle Kingsbury’s series on distributed systems
• Distributed systems are hard
• aphyr.com
Security
Security
• “Not my job!” – Elasticsearch
• That’s fine!
Dynamic Scripts
!
• Scoring
• Aggregations
• Updating
Dynamic Scripts
Runtime.getRuntime().exec(…)
Security
!
• Disable dynamic scripts
• Mind index patterns
• Even then, don’t accept arbitrary requests
Client Concerns
Client Concerns
• Connection pools
• Idempotent requests
• Have sane syncing/indexing strategies
# BOOM !
Cluster changes
Cluster changes
• Make new nodes join existing cluster
• No rolling restarts
• Easy rollback if things go bad
v1.0.0 v1.0.1
v1.0.0 v1.0.1
v1.0.0 v1.0.1
v1.0.0 v1.0.1
v1.0.0 v1.0.1
Cluster changes
• Test first
• Mind recover_*-settings
Multi-Cluster Workflows
• Snapshot/Restore
• Operations across clusters
• Swap clusters!
• Works well with good syncing strategy
Misc
• Same JVM
• ulimits
• Unicast and cluster name
• SSD? noop-scheduler
@foundsays
Learn More! !
found.no/foundation
@beiskeFollow
top related