elasticsearch in production new york meetup at twitter october 2014

Post on 20-Jun-2015

73 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Elasticsearch easily lets you develop amazing things, and it has gone to great lengths to make Lucene's features readily available in a distributed setting. However, when it comes to running Elasticsearch in production, you still have a fairly complicated system on your hands: a system with high demands on network stability, a huge appetite for memory, and a system that assumes all users are trustworthy. This talk will cover some of the lessons we've learned from securing and herding hundreds of Elasticsearch clusters.

TRANSCRIPT

Elasticsearch in production !

Konrad Beiske konrad@found.no

@beiske

Who?

Senior software engineer of Found AS Working with Elasticsearch for 2 years

Herding hundreds of Elasticsearch clusters

Agenda

Agenda• Anti-patterns

• Memory / Resource Usage

• Distributed problems

• Security

• Client concerns

• Changing a cluster

found.no/foundation

Snapshot / Restore

Circuit breakersDocument values

Aggregations

Distributed percolation

Suggesters

Snapshot / Restore

Circuit breakersDocument values

Aggregations

Distributed percolation

Suggesters

Anti-Patterns

Arbitrary Keys

• “Schema Free”

• One field per value

• Ever-growing cluster state

acls: 1234: READ 42: WRITE

Heavy Updating

• Update = Delete + Reindex

• Be careful with counters

Slow queries

• WHERE foo ILIKE ‘%bar%’

• {“query_string”: {“query”: “foo:*bar*”}}

Arbitrary searches

query: filtered: filter: term: user_id: 42 query: [user’s query here]

Time Bomb

Memory

Memory• Field caches

• Filter caches

• Page caches

• Aggregations

• Index building

Page Cache

• Keeping index pages in memory

• Can’t have too much

• Outgrow: Gradual slowdown

Heap Space

• Memory used by Elasticsearch process

• Field / Filter caches

• Aggregations

Time Bomb

Time Bomb

OutOfMemoryError

Woah there

I ate all the memories

Your cluster may or may not work any more

OutOfMemory

• Growing too big

• Selecting too big timespan in Kibana

• Document ingestion peak

Preventing OOMs• Have enough memory :-)

• Understand your search’s memory profile

• Bulk / Circuit breaker settings

• Monitoring

• Document values

Marvel( /_stats )

Document Values

"my_field": { "type": "string", "fielddata": { "format": "doc_values" } }

Sizing

Sizing

• Test, don’t guess

• Start big, scale down

• Index, search, monitor

Glitch Meltdown

Glitch Meltdown

Glitch Meltdown

Glitch Meltdown

Glitch Meltdown

• Tie-breaker can be a cheap master-node

• Applies to data centers / availability zones too

Data-only nodes

Master-only nodes

Jepsen

Jepsen

• Kyle Kingsbury’s series on distributed systems

• Distributed systems are hard

• aphyr.com

Security

Security

• “Not my job!” – Elasticsearch

• That’s fine!

Dynamic Scripts

!

• Scoring

• Aggregations

• Updating

Dynamic Scripts

Runtime.getRuntime().exec(…)

Security

!

• Disable dynamic scripts

• Mind index patterns

• Even then, don’t accept arbitrary requests

Client Concerns

Client Concerns

• Connection pools

• Idempotent requests

• Have sane syncing/indexing strategies

# BOOM !

Cluster changes

Cluster changes

• Make new nodes join existing cluster

• No rolling restarts

• Easy rollback if things go bad

v1.0.0 v1.0.1

v1.0.0 v1.0.1

v1.0.0 v1.0.1

v1.0.0 v1.0.1

v1.0.0 v1.0.1

Cluster changes

• Test first

• Mind recover_*-settings

Multi-Cluster Workflows

• Snapshot/Restore

• Operations across clusters

• Swap clusters!

• Works well with good syncing strategy

Misc

• Same JVM

• ulimits

• Unicast and cluster name

• SSD? noop-scheduler

@foundsays

Learn More! !

found.no/foundation

@beiskeFollow

top related