elasticsearch in production

26
Elasticsearch in production Alex Brasetvik @alexbrasetvik

Upload: foundsearch

Post on 10-May-2015

983 views

Category:

Technology


0 download

DESCRIPTION

Video available at http://www.youtube.com/watch?v=gkdfNl0WL-A Original slides at http://presentations.found.no/berlin-buzzwords-2013/ This talk covers some of the lessons we've learned from securing and herding hundreds of Elasticsearch clusters. It is applicable whether you operate Elasticsearch in your own infrastructure, in the cloud, or if you're a developer who wants a better understanding of Elasticsearch's various failure modes. Elasticsearch easily lets you develop amazing things, and it has gone to great lengths to make Lucene's features readily available in a distributed setting. However, when it comes to running Elasticsearch in production, you still have a fairly complicated system on your hands: a system with high expectations on network stability, a huge appetite for memory, and a system that assumes all users are trustworthy. Instead of delving deeply into a few specifics, we give a brief overview of problems you are likely to run into and suggested solutions to these problems. We cover topics that are applicable to both developers and users with Elasticsearch clusters of every shape and size – with an emphasis on resiliency and security. Basic familiarity with Elasticsearch is assumed.

TRANSCRIPT

Page 2: Elasticsearch in production

How marketing thinks our users feel

Page 3: Elasticsearch in production

How we developers sometimes feel

Page 4: Elasticsearch in production

Who?

Co-founder of Found AS7+ years of search, 2+ Elasticsearch

We manage hundreds of Elasticsearch clusters

… on Amazon's cloud

Page 5: Elasticsearch in production

Agenda

Memory (and stability)Security (and multi-tenancy)

Networking (and reliability)Client (and resiliency)

Page 6: Elasticsearch in production

Memory

Search engines crave memoryCaches, caches, caches

Field- and filter cachesPage cache

Index building

Page 7: Elasticsearch in production

PostgreSQL

Verifies resource usageSafe >>> fast

Uses disk if necessary

Page 8: Elasticsearch in production

Elasticsearch trusts youBuilt for speed

It'll jump if you ask it to

What could possibly go wrong?

Page 9: Elasticsearch in production

OutOfMemoryError

Woah there

I ate all the memories

Your cluster may or may not work any more

Page 10: Elasticsearch in production

May or may not work?

What else was happening at the time?Corrupt cluster state, crashed Netty, …

In short: Don't end up there

Page 11: Elasticsearch in production

Warning signs?

Monitor cache sizes and heap spaceOutgrowing page cache: gradual slowdown

Outgrowing heap space: sudden crash

Page 12: Elasticsearch in production

Understand the memory profileTest realisticly

Bound cache sizes and flush thresholdsv0.90+ takes you longer with field filters, etc.

Page 13: Elasticsearch in production

Large heaps are expensive to garbage collectKeep heap < 32GiB (But test!)

Lots of page cache is good, though!

Page 14: Elasticsearch in production

Security

Elasticsearch trusts everyoneNot its job to do auth(z)

You're the gatekeeper

Page 15: Elasticsearch in production

_search

Read only?Limit indexes / wrap with filters?

Protect the field caches

Page 16: Elasticsearch in production

Arbitrary code execution

Elasticsearch has powerful scripting Not sandboxedOn by default

Page 17: Elasticsearch in production

Any website can reach your machinehttp://127.0.0.1:9200/_search?callback=capture&source=…

Run in a virtual machine

Page 18: Elasticsearch in production

Networking

Elasticsearch is distributedEasy (for a distributed system)

Supports many usage patterns.

Page 19: Elasticsearch in production

Quite common topologyHigh availability, right?

Page 20: Elasticsearch in production

Obey or risk split brains …… and irrecoverable data-loss

Page 22: Elasticsearch in production

Stormy clouds

Zone vs instance failureThundering herds

Optimizing MTTR is not HA

Page 23: Elasticsearch in production

Client considerations

Idempotent/retry-able requests  Use a connection pool.

_bulk / _msearch

Page 24: Elasticsearch in production

Have enough memoryHave a majority of nodes

Don't allow arbitrary search requestsUse retryable requests