elastic meetup june16

Miguel Bosin Support Engineer, @miguelbosin

Hot/Warm Architecture + Sizing

Int• Miguel Bosin

– Support engineer– Joined in 2015– Interested in techonology

– Passion about support

• Elastic– Founded in 2012– Distributed company– Elasticsearch: What’s it?– Open-source:ES,LS,Kibana and Beats– Commercial:

X-Pack

• Miguel Bosin– Support engineer– Joined in 2015– Interested in techonology

– Passion about support

• Elastic– Founded in 2012– Distributed company– Elasticsearch: What’s

it – Open-source:ES,LS,Kibana and Beats– Commercial:

X-Pack

What is it? Open source Distributed-scalable Highly available Document-oriented (JSON) RESTful FT search engine with real-time search and analytics capabilities

Agenda

Elastic overview1

Sizing introduction3

Hot/Warm architecture4

Elasticsearch basic architecture2

Elastic current’s products overview

Agenda

Elastic overview

Sizing introduction3

Elasticsearch basic architecture

Elasticsearch terminology

A node is a single Elasticsearch instance, a single JVM Multiple nodes can form a cluster

A cluster or a node can manage multiple indices An index is a container for data

A shard is a single piece of an Elasticsearch index A shard is either a primary or a replica

Elasticsearch terminology II

Elasticsearch terminology III

Elasticsearch Architecture: Node roles

Master node:

coordinates the cluster only node able to apply changes to cluster state publishes updated cluster state to all nodes

Data node:

performs indexing can allocate shards locally knows cluster state

Elasticsearch Architecture: Node roles II

Client node:

does NOT perform indexing or allocate shards locally does NOT perform cluster management operations knows cluster state smart load balancer (load balancing Kibana searches i.e.) redirect operations to the nodes that holds the relevant

data calculate aggregations results

Nodes roles are set in the elasticsearch.yml

Elasticsearch Architecture: Node roles III

Architecture: node roles

Architecture special case: dedicated master nodes

Dedicated master nodes –Why / minimum_master_nodes

Indexing and searching data is CPU-, memory-, and I/O-intensive work which can put pressure on a node’s resources

Avoiding split brain: 2 current master nodes on the same cluster DATA LOSS

Set this setting discovery.zen.minimum_master_nodes to the quorum:

(master_eligible_nodes / 2) + 1

Agenda

Elastic overview

Sizing introduction

Sizing: general factors (server capacity)

• Disks (SSD vs. HD)

• RAM -1/2 total RAM for ES

-ES heap size max: 30.5Gb

• # CPU cores -ES threadpools concept

**1 shard—>gets 1 thread—>1 java process—>1core**

Sizing: Elasticsearch factors (logging case)

Size of shards Number of shards on each node Retention period of data Mapping configuration -Which fields are searchable, _source enabled or

not,etc… Size (average) of the documents

Sizing: Capacity planning test I

FIRST: testing on a single node with a single index with one shard and no replica

THEN: insert as many documents as you can and run some typical queries

At some point, queries will start to slow down to a threshold, which no longer meet your requirements

This is the ideal number of documents a single shard is able to hold

NEXT: Find the ideal number your primary shards (by dividing your dataset size by the ideal shard size)

FINALLY: Add replicas for HA and improve the read throughput

Sizing: Capacity planning test II

Each experiment tries to accomplish a discreet goal and build upon previous

Determine various disk utilization

1 2 3 4

Determine breaking point of a shard

Determine saturation point of

a node

Test desired configuration on two node cluster

Agenda

Elastic overview

Sizing introduction

Hot/Warm architecture

Hot / Warm architecture

When using it?

Elasticsearch for larger time-data analytics use cases Using time-based indices Able to run an architecture with 3 different types of nodes

Hot / Warm architecture: Type of nodes

Master, Hot and Warm nodes:

Master nodes: 3 dedicated master nodes Hot data nodes: perform all indexing and also hold the most

recent daily (data to be queried most frequently). Powerful machines with SSD storage

Warm data nodes: handle a large amount of read-only indices that are not queried frequently. Very large attached spinning disks

Hot / Warm architecture: tagging

Which node is doing what?

ES needs to know which servers contain the hot nodes and which servers contain the warm nodes

This can be achieved by assigning arbitrary tags to each server (Hot/Warm)

Tag the node with node.box_type: xxx in elasticsearch.yml

OR start a node using ./bin/elasticsearch --node.box_type xxx

Hot / Warm architecture: Force Merge API

Optimizing your indices in the Warm Node

The force merge API allows to force merging of one or more indices through an API. Optimizes the index for faster search operation

The merge relates to the number of segments a Lucene index holds within each shard

The force merge operation allows to reduce the number of segments by merging them:

$ curl -XPOST 'http://localhost:9200/my_index/_forcemerge'

Hot / Warm architecture: Demo time!!

elastic meetup june16

Presentations & Public Speaking

finecobank overview june16

elastic meetup porto alegre

elastic @ getyourguide click to edit master title style...

corporateoverview - june16

hot cinema program june16

rigid plastics argentina analyst trip june16 download

201708 seoul meetup - s3.ap-northeast-2. · pdf filecommon...

iss 23 - venice canals - june16 (1)

csr2011 june16 12_00_wagner

csr2011 june16 11_30_georgiadis

integrating elastic and apache spark - elastic london meetup...

20338 wth growing june16

reagents brochure lt314 june16

multiple ways of building a recommender system with...

june16 presentation

june16-30: iii issue hindi

meetup :: update elastic stack 5.0

arkansas medical news may-june16

hism pretoria june16

csr2011 june16 16_30_golovach