elasticsearch sharding strategy at tubular labs

32
Elasticsearch Sharding Strategy at Tubular Labs How we arrived at a sharding strategy

Upload: tubular-labs

Post on 15-Apr-2017

263 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Elasticsearch Sharding Strategy at Tubular Labs

Elasticsearch Sharding Strategy at Tubular LabsHow we arrived at a sharding strategy

Page 2: Elasticsearch Sharding Strategy at Tubular Labs

Our Elasticsearch Infrastructure?

Page 3: Elasticsearch Sharding Strategy at Tubular Labs

• 3 clusters for search/aggregations

• 1 small autocomplete cluster

• 1 medium sized cluster for internal use

• 1 Elastic Stack cluster

Our Elasticsearch Clusters

© 2016 Tubular Labs

3

Page 4: Elasticsearch Sharding Strategy at Tubular Labs

• 2.5 billion documents

• 4TB not including replicas

• Constant indexing load with periodic spikes

• Queries range from simple search request to heavy terms aggregations

• Not many concurrent queries, but queries can be demanding

• Cluster is very CPU heavy

• Recently migrated from Elasticsearch 1.7 to 2.3

Our Largest Cluster

© 2016 Tubular Labs

4

Page 5: Elasticsearch Sharding Strategy at Tubular Labs

• We have to reindex anyway

• Our dataset has grown substantially

• Performance wasn’t great

• We don’t want to have to reindex in the near future

Migrating to 2.x is a good time to reconsider sharding

© 2016 Tubular Labs

5

Page 6: Elasticsearch Sharding Strategy at Tubular Labs

Sharding Strategy

Page 7: Elasticsearch Sharding Strategy at Tubular Labs

● How many shards should I have per index?

● How large should my shards be?

● How many shards should I have per node?

● What hardware/instance type should I use?

Sharding Questions...

© 2016 Tubular Labs

7

Page 8: Elasticsearch Sharding Strategy at Tubular Labs

• How large is your dataset?

• How fast will your dataset grow?

• What kinds of queries are you running?

• How fast will usage grow?

• When do you want to reindex next?

• I’m sure there are more...

It Depends...

© 2016 Tubular Labs

8

Page 9: Elasticsearch Sharding Strategy at Tubular Labs

How do we get answers?

© 2016 Tubular Labs

9

Page 10: Elasticsearch Sharding Strategy at Tubular Labs

Repeatable Elasticsearch Experiments

Page 11: Elasticsearch Sharding Strategy at Tubular Labs

What We Want

• Repeatable• Others can easily run the same tests and should get about the same results

• Easily modified

• Easy to define and understand

• Easy to run

• understandable results

Repeatable Elasticsearch Experiments:

© 2016 Tubular Labs

11

Page 12: Elasticsearch Sharding Strategy at Tubular Labs

• Benchmarking framework for Elasticsearch

• Easily define a set of repeatable tests• Tests are defined in JSON

• Compare different configurations

• Sets up a single node cluster for tests or

target existing (external) clusters

• Targeting external clusters is not fully supported

and you’ll get warnings telling you as much

What is Rally?

© 2016 Tubular Labs

12

Page 13: Elasticsearch Sharding Strategy at Tubular Labs

Terms•Track - a benchmarking scenario

•Car - system (Elasticsearch) configuration for a

benchmark

•Challenge - what benchmarks are run and its

configuration

•Race - an actual run of the benchmark

•Tournaments - A way to analyze the impact of

changes

What is Rally?

© 2016 Tubular Labs

13

Page 14: Elasticsearch Sharding Strategy at Tubular Labs

Example track config

https://gist.github.com/mdelaney/b710fb3d25fabf7818f471bd4abe70a5

How does Rally work?

© 2016 Tubular Labs

14

Page 15: Elasticsearch Sharding Strategy at Tubular Labs

Our Experiments and Results

Page 16: Elasticsearch Sharding Strategy at Tubular Labs

NOTE: The following experiments are written as we would do them next time. Due to time constraints we had to do some of this in parallel. I’ll also mention where we deviated from what is in the next few slides.

• We’re still pretty new at running benchmarks with Elasticsearch so we’re still learning the

best way to do this.

• Running these tests answered a lot of questions (and raised brand new ones)

How we used this at Tubular Labs

© 2016 Tubular Labs

16

Page 17: Elasticsearch Sharding Strategy at Tubular Labs

How big should my shards be?

Determining a good shard size

© 2016 Tubular Labs

17

Page 18: Elasticsearch Sharding Strategy at Tubular Labs

The experiment

1. Obtain a realistic data set

2. Write the Rally config to:• Index your data (single shard)

• Run a set of common queries

3. Run benchmark with different document counts

4. Graph the results

Determining a good shard size

© 2016 Tubular Labs

18

Page 19: Elasticsearch Sharding Strategy at Tubular Labs

The queries we used

• Query A and B:• Very similar but aggregate on a slightly different set of terms

• Hits about 10% of our dataset

• Query C and D:• Same aggregations as queries A and B

• Full dataset

Determining a good shard size

© 2016 Tubular Labs

19

Page 20: Elasticsearch Sharding Strategy at Tubular Labs

Our results

Determining a good shard size

© 2016 Tubular Labs

20

Page 21: Elasticsearch Sharding Strategy at Tubular Labs

We need to consider

• How fast do you need each query to be?

• How much do you expect your data set to grow before you want to look at reindexing

again?

• Your use case likely will have other concerns as well

Determining a good shard size

© 2016 Tubular Labs

21

Page 22: Elasticsearch Sharding Strategy at Tubular Labs

How many shards per node?

Determining how many shards per node

© 2016 Tubular Labs

22

Page 23: Elasticsearch Sharding Strategy at Tubular Labs

The experiment (almost the same as before)

1. Obtain a dataset of realistic data

2. Write the Rally config to:• Index your data

• Run a set of common queries

3. Run benchmark with different shard counts

4. Graph the results

Determining how many shards per node

© 2016 Tubular Labs

23

Page 24: Elasticsearch Sharding Strategy at Tubular Labs

What we did differently this time (time constraints)

• Used the Apache HTTP Benchmark Tool with a script to run the queries.

• Our production cluster had 26 data nodes with about 200 million documents each

• Wanted to avoid expanding the cluster further if at all possible (c3.8xlarge is pricey!)• 10 total shards per node (about 20 million docs/shard)

• 16 total shards per node (about 12.5 million docs/shard)

• 32 total shards per node (about 6.25 million docs/shard)

• Tested on 3 node clusters (2 data nodes, 1 client/master)

Determining how many shards per node

© 2016 Tubular Labs

24

Page 25: Elasticsearch Sharding Strategy at Tubular Labs

Our Results - Testing Number of Shards per node

Query response by shard count (C 1) Query response by shard count (C 3)

© 2016 Tubular Labs

25

Page 26: Elasticsearch Sharding Strategy at Tubular Labs

Our Results - Testing Number of Shards per node

Query response production vs test (C 1) Query response production vs test (C 3)

© 2016 Tubular Labs

26

Production - 26 data nodes

Test Cluster - 2 data nodes

Page 27: Elasticsearch Sharding Strategy at Tubular Labs

• Significant performance drop in each level of testing, why?

• A single shard on a single node performed much better than our

multiple shards per node tests

• The fully loaded 3 node cluster performed much better than our full

cluster in production

• Impact of moving to a machine with more memory• Will the extra file system cache make a large difference?

New Questions Raised

© 2016 Tubular Labs

27

Page 28: Elasticsearch Sharding Strategy at Tubular Labs

Query load isn’t evenly distributed

Current path of performance investigation

© 2016 Tubular Labs

28

1 4

3* 2*

5* 8*

10 13*

11 6*

2 5

7* 4*

10* 9*

11* 12*

14 15

3 6

1* 9

13 8

12 7

15* 14*

Page 29: Elasticsearch Sharding Strategy at Tubular Labs

Problems We Encountered

Page 30: Elasticsearch Sharding Strategy at Tubular Labs

Rally related

• Document count in track.json != the

document count Rally checks at the end

of indexing with nested documents.

• Multi node support not yet available

Problems We Encountered?

© 2016 Tubular Labs

30

Page 31: Elasticsearch Sharding Strategy at Tubular Labs

Non Rally related

•Performance in reality wasn’t as good as our testing suggested it should be• We haven’t found the reason for this yet

• We’ve noticed a correlation between the number of shards a query hits per node and the time taken to run the

query on the shard but have not yet identified the bottleneck.

• We were able to mitigate this by adding additional data nodes

Problems We Encountered?

© 2016 Tubular Labs

31

Page 32: Elasticsearch Sharding Strategy at Tubular Labs

Thank You!Questions??