building the right platform architecture for hadoop

Building the right Platform Architecture for Hadoop

ROMMEL GARCIA HORTONWORKS

#whoami

▸ Sr. Solutions Engineer & Platform Architect @hortonworks

▸ Global Security SME Lead @hortonworks

▸ Author of “Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture”

▸ Runs Atlanta Hadoop User Group

OverviewThe Elephant keeps getting bigger

THE HADOOP ECOSYSTEM

THE TWO FACE OF HADOOP

Hadoop Cluster

Master Nodes Worker Nodes

Manage cluster state

Manage data & job state

Manage CPU resources

Manage RAM resources

Manage I/O resources

Manage Network resources

RUN W/

PLATFORM FOCUS

▸ Performance (SLA)

▸ Scale (Bursts)

▸ Speed (RT)

▸ Throughput (data/time, compute/time)

▸ Resiliency (HA)

▸ Graceful Degradation (Throttling/Failure Mgt.)

NetworkIt’s not Pyramidal

NETWORK IS EVERYTHING!

WorkloadsYour workload no longer affects me

HADOOP AND THE WORLD OF WORKLOADS

Workloads (YARN)

Hadoop Cluster

Deployment

OLAP OLTP DATA SCIENCE STREAMING

STRUCTURED UNSTRUCTUREDData (HDFS)

HADOOP WORKLOAD MANAGEMENT

Hadoop Cluster

PHYSICAL MEMORY/CPUqueues

containers

DEFAULT GRID SETTINGS FOR ALL WORKLOADS

▸ ext4 or XFS for Worker Nodes ENFORCED

▸ Transparent Huge Pages (THP) Compaction OFF

▸ Masters’ Swapiness = 1, Workers’ Swap turned OFF

▸ Jumbo Frames ENABLED

▸ IO Scheduling “deadline” ENABLED

▸ Limiting “processes” and “files” ENFORCED

▸ Name Service Cache ENABLED

OLAPGive back my precious SQL !

HIVE/SPARK SQL WORKLOAD BEHAVIOR

▸ Hive/Spark SQL workload, depends on YARN

▸ Near-realtime to Batch SLA

▸ Large data set, consumes lots of memory, fair cpu usage

▸ Hundreds to hundreds of thousands of analytical jobs daily

▸ Typically Memory bound first, then I/O

HIVE/SPARK SQL WORKLOAD PARALLELISM

▸ Hive

▸ Hive auto-parallelism ENABLED

▸ Hive on Tez ENABLED

▸ Reuse Tez Session ENABLED

▸ ORC for Hive ENFORCED

▸ Spark

▸ Repartition Spark SQL RDD ENFORCED

▸ 2GB to 4GB YARN container size

HIVE/SPARK SQL WORKLOAD DEPLOYMENT MODELS

Bare Metal

Master Node

▸ 2 x 6 CPU Cores

▸ 128 GB RAM

▸ 4 x 1TB SSD RAID 10 plus 1 Hot-spare

▸ 2 x 10Gbe NIC Bonded

Worker Node

▸ 2 x 8 CPU Cores

▸ 256 GB RAM

▸ 2 x 1TB SSD RAID 1

▸ 12 x 4TB SATA/SAS/NL-SAS

All Nodes are the same

▸ Typically more nodes vs bare metal

▸ Storage (AWS/Azure)

▸ Batch - S3/Blob/ADLS

▸ Interactive - EBS/Premium

▸ Near-realtime - Local/Local

▸ vm types (AWS/Azure)

▸ >=m4.4xlarge (EBS) / >= A9 (Blob/Premium)

▸ >=i2.4xlarge (Local) / >= DS14 (Premium/Local)

▸ >=d2.2xlarge (Local) / >= DS14_v2 (Premium/Local)

Virtualized On-prem

More Master Node vs Bare Metal

▸ 2 x 4 vCPU Cores

▸ 48 GB vRAM

▸ 1TB SAN/NAS

▸ 2 x 10Gbe vNIC Bonded

Worker Node

▸ 6 vCPU Cores

▸ 32 GB vRAM

▸ 1TB SAN/NAS

▸ Storage (data)

▸ SAN/NAS

▸ Appliance

▸ NetApp

▸ Isilon

▸ vBlock

OLTPCome on, OLTP is fun!

HBASE/SOLR WORKLOADS BEHAVIOR

▸ HBase & Solr workload

▸ Realtime, in sub-seconds SLA

▸ Large data set

▸ Millions to Billions of hits per day

▸ HBase

▸ HBase for GET/PUT/SCAN

▸ Typically IO bound first, then memory for HBase

▸ Solr

▸ Solr for Search

▸ Typically CPU bound first, then memory for Solr

HBASE WORKLOAD PARALLELISM

▸ Use HBase Java API/Phoenix, not ThriftServer

▸ HBase “short circuit” reads ENABLED

▸ HBase BucketCache ENABLED

▸ Manage HBase Compaction for block locality

▸ HBase Read HA ENABLED

▸ Properly design “rowkey” and ColumnFamily

▸ Properly design RPC Handlers

▸ Minimize GC pauses

HBASE WORKLOAD DEPLOYMENT MODEL

Bare Metal

Master Node (ZK, HBase Master)

▸ 2 x 6 CPU Cores

▸ 128 GB RAM

Worker Node

▸ 2 x 8-12 CPU Cores

▸ 256 GB RAM

▸ 12-24 x 4TB SATA/SAS/NL-SAS/SSD

▸ Local

▸ >= d2.4xlarge / >= D14_v2

Virtualized On-prem

▸ 48 GB vRAM

▸ 1TB SAN/NAS

Worker Node w/ DAS

▸ 256 GB RAM

▸ 12-24 x 4TB SATA/SAS/NL-SAS/SSD

SOLR WORKLOAD PARALLELISM

▸ # of Shards I/O operations Speed of disk

▸ # of threads Indexing Speed

▸ Properly manage RAM Buffer Size

▸ Merge Factor = 25 (indexing), 2 (search)

▸ Use large disk cache

▸ Minimize GC pauses

SOLR WORKLOAD DEPLOYMENT MODEL

Bare Metal

Master Node (ZK)

▸ 2 x 4 CPU Cores

▸ 32 GB RAM

▸ 2 x 1TB SSD (OS) RAID 1

Worker Node

▸ 2 x 10 CPU Cores

▸ 256 GB RAM

▸ 12 x 2 - 4TB SATA/SAS/NL-SAS/SSD

▸ Local

▸ >= d2.4xlarge / >= D14_v2

Virtualized On-prem

▸ 48 GB vRAM

▸ 1TB SAN/NAS

Worker Node w/ DAS

▸ 256 GB RAM

▸ 12 x 2 - 4TB SATA/SAS/NL-SAS/SSD

Data ScienceBut Captain, one does not simply warp into Mordor

DATA SCIENCE WORKLOAD BEHAVIOR

▸ Spark ML workload, depends on YARN

▸ Near-realtime to Batch SLA

▸ Runs “Monte Carlo” type of processing

▸ Hundreds to hundreds of thousands of analytical jobs daily

▸ CPU bound first, then memory

SPARK ML WORKLOAD PARALLELISM

▸ YARN ENABLED

▸ Cache data

▸ Serialized/Raw for fast access/processing

▸ Off-heap, slower processing

▸ Use Checkpointing

▸ Use Broadcast Variables to minimize network traffic

▸ Minimize GC by limiting object creation, use Builders

▸ executor-memory = between 8GB to 64GB

▸ executor-cores = between 2 to 4

▸ num-executors = w/ caching, datasize * 2 as total app memory

SPARK ML WORKLOAD DEPLOYMENT MODELS

Bare Metal

Master Node

▸ 2 x 6 CPU Cores

▸ 64 GB RAM

Worker Node

▸ 256 - 512 GB RAM

▸ 12 x 4TB SATA/SAS/NL-SAS

▸ Local

▸ >= d2.4xlarge / >= D14_v2

Virtualized On-prem

▸ 48 GB vRAM

▸ 1TB SAN/NAS

Worker Node w/ DAS

▸ 256 GB RAM

▸ 12-24 x 4TB SATA/SAS/NL-SAS

StreamingStream me in scotty !

NIFI/STORM WORKLOAD BEHAVIOR

▸ NiFi & Storm Streaming workloads

▸ Always “ON” data ingest and processing

▸ Guarantees data delivery

▸ Highly distributed

▸ simple event processing -> complex event processing

NIFI WORKLOAD PARALLELISM

▸ Network bound first, then memory or cpu

▸ Go granular on NiFi processors

▸ nifi.bored.yield.duration=10

▸ RAID 10 for repositories: Content, FlowFile and Provenance

▸ nifi.queue.swap.threshold=20000

▸ No sharing of disk across repositories, use RAID 10

NIFI WORKLOAD DEPLOYMENT MODELS

Bare Metal

Master Node

▸ 2 x 4 CPU Cores

▸ 32 GB RAM

Worker Node

▸ 2 x 8 CPU Cores

▸ 128 GB RAM

▸ 6 x 1TB SATA/SAS/NL-SAS RAID 10

▸ Local/EBS/Premium

▸ >= m4.4xlarge / >= A9

Virtualized On-prem

▸ 16 GB vRAM

▸ 1TB SAN/NAS

Worker Node w/ DAS

▸ 2 x 8 CPU Cores

▸ 128 GB RAM

▸ 6 x 1TB SATA/SAS/NL-SAS RAID 10

STORM WORKLOAD PARALLELISM

▸ CPU bound first, then memory

▸ Keep Tuple processing light, quick execution of execute()

▸ Use Google’s Guava for bolt-local caches (mem <1GB)

▸ Shared caches across bolts: HBase, Phoenix, Redis or MemcacheD

▸ # of workers memory

STORM WORKLOAD DEPLOYMENT MODELS

Bare Metal

Master Node

▸ 2 x 4 CPU Cores

▸ 32 GB RAM

Worker Node

▸ 256 GB RAM

▸ Local/EBS/Premium

▸ >= r3.4xlarge / >= A9

Virtualized On-prem

▸ 24 GB vRAM

▸ 1TB SAN/NAS

Worker Node w/ DAS

▸ 256 GB RAM

▸ 1TB SAN/NAS

http://hortonworks.com/products/sandbox/

THANK YOU!

@rommelgarcia /in/rommelgarcia

#hortonworks

building the right platform architecture for hadoop

Technology

hadoop programming - 2. hadoop architecture for data science

network traffic analysis using hadoop architecture

introduction to hadoop 2.0 & yarn | hadoop 2.0 & yarn...

hadoop as a platform for genomics

uniconnect hybrid data platform nov2017 - …...8. platform...

hpe reference architecture for hortonworks hdp 2.4 on hpe...

hdinsight essentials hadoop on microsoft platform

hortonworks data platform - hadoop security guide · pdf...

windows on hadoop: integrating the desktop and the apache...

whitepaper apache hadoop modern data architecture

hadoop tutorials. todays agenda hadoop introduction and...

introduction to hadoop-mapreduce platform

dockerized hadoop platform and recent updates in...

yarn - hadoop next generation compute platform

hadoop: the default machine learning platform ?

hadoop and modern data architecture

data platform for hadoop - nec(japan)...data platform for...

sql on hadoop technology, architecture &...

ebay experimentation platform on hadoop

hadoop overview & architecture