elastic meetup june16
TRANSCRIPT
1
Miguel Bosin Support Engineer, @miguelbosin
Hot/Warm Architecture + Sizing
2
Intro
Int• Miguel Bosin
– Support engineer– Joined in 2015– Interested in techonology
– Passion about support
• Elastic– Founded in 2012– Distributed company– Elasticsearch: What’s it?– Open-source:ES,LS,Kibana and Beats– Commercial:
X-Pack
3
Intro
• Miguel Bosin– Support engineer– Joined in 2015– Interested in techonology
– Passion about support
• Elastic– Founded in 2012– Distributed company– Elasticsearch: What’s
it – Open-source:ES,LS,Kibana and Beats– Commercial:
X-Pack
4
What is it? Open source Distributed-scalable Highly available Document-oriented (JSON) RESTful FT search engine with real-time search and analytics capabilities
5
Agenda
Elastic overview1
Sizing introduction3
Hot/Warm architecture4
Elasticsearch basic architecture2
6
Elastic current’s products overview
7
Agenda
Elastic overview
Sizing introduction3
Hot/Warm architecture4
Elasticsearch basic architecture
1
2
8
Elasticsearch terminology
A node is a single Elasticsearch instance, a single JVM Multiple nodes can form a cluster
A cluster or a node can manage multiple indices An index is a container for data
A shard is a single piece of an Elasticsearch index A shard is either a primary or a replica
9
Elasticsearch terminology II
10
Elasticsearch terminology III
11
Elasticsearch Architecture: Node roles
Master node:
coordinates the cluster only node able to apply changes to cluster state publishes updated cluster state to all nodes
Data node:
performs indexing can allocate shards locally knows cluster state
12
Elasticsearch Architecture: Node roles II
Client node:
does NOT perform indexing or allocate shards locally does NOT perform cluster management operations knows cluster state smart load balancer (load balancing Kibana searches i.e.) redirect operations to the nodes that holds the relevant
data calculate aggregations results
13
Nodes roles are set in the elasticsearch.yml
Elasticsearch Architecture: Node roles III
14
Architecture: node roles
15
Architecture: node roles
16
Architecture special case: dedicated master nodes
17
Dedicated master nodes –Why / minimum_master_nodes
Indexing and searching data is CPU-, memory-, and I/O-intensive work which can put pressure on a node’s resources
Avoiding split brain: 2 current master nodes on the same cluster DATA LOSS
Set this setting discovery.zen.minimum_master_nodes to the quorum:
(master_eligible_nodes / 2) + 1
18
Agenda
Elastic overview
Sizing introduction
Hot/Warm architecture4
Elasticsearch basic architecture
1
3
2
19
Sizing: general factors (server capacity)
• Disks (SSD vs. HD)
• RAM -1/2 total RAM for ES
-ES heap size max: 30.5Gb
• # CPU cores -ES threadpools concept
**1 shard—>gets 1 thread—>1 java process—>1core**
20
Sizing: Elasticsearch factors (logging case)
Size of shards Number of shards on each node Retention period of data Mapping configuration -Which fields are searchable, _source enabled or
not,etc… Size (average) of the documents
21
Sizing: Capacity planning test I
FIRST: testing on a single node with a single index with one shard and no replica
THEN: insert as many documents as you can and run some typical queries
At some point, queries will start to slow down to a threshold, which no longer meet your requirements
This is the ideal number of documents a single shard is able to hold
NEXT: Find the ideal number your primary shards (by dividing your dataset size by the ideal shard size)
FINALLY: Add replicas for HA and improve the read throughput
22
Sizing: Capacity planning test II
Each experiment tries to accomplish a discreet goal and build upon previous
22
Determine various disk utilization
1 2 3 4
Determine breaking point of a shard
Determine saturation point of
a node
Test desired configuration on two node cluster
23
Agenda
Elastic overview
Sizing introduction
Hot/Warm architecture
3
Elasticsearch basic architecture
1
2
4
24
Hot / Warm architecture
When using it?
Elasticsearch for larger time-data analytics use cases Using time-based indices Able to run an architecture with 3 different types of nodes
25
Hot / Warm architecture: Type of nodes
Master, Hot and Warm nodes:
Master nodes: 3 dedicated master nodes Hot data nodes: perform all indexing and also hold the most
recent daily (data to be queried most frequently). Powerful machines with SSD storage
Warm data nodes: handle a large amount of read-only indices that are not queried frequently. Very large attached spinning disks
26
Hot / Warm architecture: tagging
Which node is doing what?
ES needs to know which servers contain the hot nodes and which servers contain the warm nodes
This can be achieved by assigning arbitrary tags to each server (Hot/Warm)
Tag the node with node.box_type: xxx in elasticsearch.yml
OR start a node using ./bin/elasticsearch --node.box_type xxx
27
Hot / Warm architecture: Force Merge API
Optimizing your indices in the Warm Node
The force merge API allows to force merging of one or more indices through an API. Optimizes the index for faster search operation
The merge relates to the number of segments a Lucene index holds within each shard
The force merge operation allows to reduce the number of segments by merging them:
$ curl -XPOST 'http://localhost:9200/my_index/_forcemerge'
28
Hot / Warm architecture: Demo time!!
DEMO