managing growth in production hadoop deployments

47
MANAGING GROWTH IN PRODUCTION HADOOP DEPLOYMENTS Soam Acharya @soamwork Charles Wimmer @cwimmer Altiscale @altiscale HADOOP SUMMIT 2015 SAN JOSE

Upload: altiscale

Post on 08-Aug-2015

406 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Managing growth in Production Hadoop Deployments

MANAGING GROWTH IN PRODUCTION HADOOP DEPLOYMENTS

Soam Acharya

@soamwork

Charles Wimmer

@cwimmer

Altiscale

@altiscale

HADOOP SUMMIT 2015

SAN JOSE

Page 2: Managing growth in Production Hadoop Deployments

2

ALTISCALE : INFRASTRUCTURE NERDS

• Soam Acharya - Head of Application Engineering• Formerly Chief Scientist @ Limelight OVP, Yahoo Research Engineer

• Charles Wimmer, Head of Operations• Former Yahoo! & LinkedIn SRE• Managed 40000 nodes in Hadoop clusters at Yahoo!

• Hadoop as a Service, built and managed by Big Data, SaaS, and enterprise software veterans• Yahoo!, Google, LinkedIn, VMWare, Oracle, ...

Page 3: Managing growth in Production Hadoop Deployments

3

SO, YOU’VE PUT TOGETHER YOUR FIRST HADOOP DEPLOYMENT

● It’s now running production ETLs

Page 4: Managing growth in Production Hadoop Deployments

CONGRATULATIONS!

Page 5: Managing growth in Production Hadoop Deployments

5

BUT THEN ...

• Your data scientists get on the cluster and start building models

Page 6: Managing growth in Production Hadoop Deployments

6

BUT THEN ...

• Your data scientists get on the cluster and start building models

• Your BI team starts running interactive SQL on Hadoop queries ..

Page 7: Managing growth in Production Hadoop Deployments

7

BUT THEN ...

• Your data scientists get on the cluster and start building models

• Your BI team starts running interactive SQL on Hadoop queries ..

• Your mobile team starts sending RT events into the cluster ..

Page 8: Managing growth in Production Hadoop Deployments

8

BUT THEN ...

• Your data scientists get on the cluster and start building models

• Your BI team starts running interactive SQL on Hadoop queries ..

• Your mobile team starts sending RT events into the cluster ..

• You sign up more clients• And the input data for your initial use case doubles ..

Page 9: Managing growth in Production Hadoop Deployments

9

SOON, YOUR CLUSTER ...

Page 10: Managing growth in Production Hadoop Deployments

10

AND YOU …

Page 11: Managing growth in Production Hadoop Deployments

11

THE “SUCCESS DISASTER” SCENARIO

● Initial success● Many subsequent use cases on cluster● Cluster gets bogged down

Page 12: Managing growth in Production Hadoop Deployments

12

WHY DO CLUSTERS FAIL?

• Failure categories:1. Too much data

2. Too many jobs

3. Too many users

Page 13: Managing growth in Production Hadoop Deployments

13

HOW EXTRICATE YOURSELF?

• Short term strategy:• Get more resources for your cluster

• Expand cluster size!

• More headroom for longer term strategy

• Longer term strategy

Page 14: Managing growth in Production Hadoop Deployments

14

LONGER TERM STRATEGY

• Can’t cover every scenario

• Per failure category:• Selected pressure points (PPs)

• Can occur at different levels of Hadoop stack

• Identify and shore up pressure points

• Squeeze more capacity from cluster

Page 15: Managing growth in Production Hadoop Deployments

15

HADOOP 2 STACK REMINDER

Application Layer

Execution Framework

Core Hadoop Layer

Machine Level

YARN

network

h/w

OS

disk

DN NN

HDFS

NNdisk

NM NMcpu cpu

RAM RAM

RM

MRSpark Tez

HiveSparkSQ

LCascadin

gPig

Application

YARN

Page 16: Managing growth in Production Hadoop Deployments

16

FAILURE CATEGORY 1 - TOO MUCH DATA

16

PP: HDFS at capacity

PP: Too many objectsYARN

network

h/w

OS

disk

DN NN

HDFS

NNdisk

NM NMcpu cpu

RAM RAM

RM

MRSpark Tez

HiveSparkSQ

LCascadin

gPig

Application

YARN

Page 17: Managing growth in Production Hadoop Deployments

17

PRESSURE POINT - HDFS AT CAPACITY

• Unpredictable cluster behavior• Transient errors• Hadoop daemons can’t save logs to HDFS

• Execution framework errors:• Hive unable to run queries that create temp tables

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hive-user/hive_2014-07-23_08-43-40_408_2604848084853512498-1/_task_tmp.-ext-10001/_tmp.000121_0 could only be replicated to 1 nodes instead of minReplication (=2). There are xx datanode(s) running and no node(s) are excluded in this operation.

Page 18: Managing growth in Production Hadoop Deployments

18

HDFS AT CAPACITY MITIGATION

● Use HDFS quotas!

hdfs dfsadmin -setSpaceQuota 113367670 /

● Quotas can be set per directory● Cannot be set per user● Protection against accidental cluster destabilzation

Page 19: Managing growth in Production Hadoop Deployments

19

TOO MANY OBJECTS

“Elephants are afraid of mice. Hadoop is afraid of small files.”

# of dirs + files

# of blocks

Page 20: Managing growth in Production Hadoop Deployments

20

TOO MANY OBJECTS

● Memory pressure:o Namenode heap: too many files + directories + objects in HDFSo Datanode heap: too many blocks allocated per node

● Performance overhead o Too much time spent on container creation and teardowno More time spent in execution framework than actual application

Page 21: Managing growth in Production Hadoop Deployments

21

WHERE ARE THE OBJECTS?

Use HDFS count:

hdfs dfs -count -q <directory name>

● Number of directories, files and bytes ● On per directory basis

Use fsimage files:

● Can be produced by NN

hdfs oiv <fsimage file>

● Detailed breakdown of the HDFS file system● Hard!

Page 22: Managing growth in Production Hadoop Deployments

22

TOO MANY OBJECTS - MITIGATION

• Short term:• Increase NN/DN heapsizes

• Node physical limits

• Increase cluster node count

• Longer term:• Find and compact

• Coalesce multiple files

• Use HAR

Page 23: Managing growth in Production Hadoop Deployments

23

COALESCE MULTIPLE FILES I

• Hadoop streaming job

• Whatever Hadoop can read on cluster

• LZO output

hadoop \ jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-*.jar \ -D mapreduce.job.reduces=40 \ -D mapred.output.compress=true \ -D mapred.output.compression.codec=\com.hadoop.compression.lzo.LzopCodec \ -D mapreduce.output.fileoutputformat.compress.type=BLOCK \ -D mapreduce.reduce.memory.mb=8192 \ -mapper /bin/cat \ -reducer /bin/cat \ -input $IN_DIR \ -output $DIR

Page 24: Managing growth in Production Hadoop Deployments

24

COALESCE MULTIPLE FILES II

● Build index for LZO output

● Tell hadoop where the splits are

hadoop \ jar /opt/hadoop/share/hadoop/common/lib/hadoop-lzo-*.jar \ com.hadoop.compression.lzo.DistributedLzoIndexer \ $DIR

Page 25: Managing growth in Production Hadoop Deployments

25

COMBINE FILES INTO HAR

• HAR: Hadoop Archive

hadoop archive -archiveName <archive name>.har -p <HDFS parent path> <dir1> <dir2> ... <outputDir>

• MR job to produce archive

• Watch out for replication factor• On versions 2.4 and earlier, source files are set to a default replication factor

10• Not good for small clusters• -r <replication factor> option added in 2.6

Page 26: Managing growth in Production Hadoop Deployments

26

COMBINE FILES INTO HAR

• HAR archives are useful if you want to preserve the file/directory structure of input

[alti_soam@desktop ~]$ hdfs dfs -ls har:///tmp/alti_soam_test.harFound 3 itemsdrwxr-xr-x - alti_soam hdfs 0 2013-09-03 22:44 har:/tmp/alti_soam_test.har/examplesdrwxr-xr-x - alti_soam hdfs 0 2013-11-16 03:53 har:/tmp/alti_soam_test.har/test-pig-avro-dirdrwxr-xr-x - alti_soam hdfs 0 2013-11-12 22:23 har:/tmp/alti_soam_test.har/test-camus

Page 27: Managing growth in Production Hadoop Deployments

27

FAILURE CATEGORY 2 - TOO MANY JOBS

“Help! My job is stuck!”

YARN

network

h/w

OS

disk

DN NN

HDFS

NNdisk

NM NMcpu cpu

RAM RAM

RM

MRSpark Tez

HiveSparkSQ

LCascadin

gPig

Application

YARN

Jobs don’t make progress

Jobs don’t start

“Right” jobs finish last

Mixed profile job issues

Page 28: Managing growth in Production Hadoop Deployments

28

TOO MANY JOBS REMEDIATION

• Need to quantify job processing on cluster

• Hadoop job usage analysis:• Resource Manager logs• History Server logs, job history files• APIs

• Analysis goals:• Queue usage => cluster utilization• Time spent by jobs/containers in waiting state• Job level stats

• # of jobs, type of jobs …

• Queue tuning

Page 29: Managing growth in Production Hadoop Deployments

29

HADOOP LOGS - RESOURCE MANAGER

• job stats (outcome, duration, startdate)

• queue used

• container:• number allocated• Memory, vCPU allocation• state transition times• outcome

Page 30: Managing growth in Production Hadoop Deployments

30

HADOOP LOGS - JOBHISTORY FILES

• Configure history server to produce files

• Created for every MR job• HDFS data volume processed• for mappers/reducers:

• CPU time• memory used• start/end time• max parallel maps, reduces

• GC time

• not available for Tez/Spark:• Use timeline server for better logging• Timeline server dependencies

Page 31: Managing growth in Production Hadoop Deployments

31

HADOOP LOG ANALYSIS

• Analysis goals:• Queue usage => cluster utilization• Time spent by jobs/containers in waiting state• Job level stats:

• # of jobs

• Failed/killed vs successful

• Type of jobs

• Container level stats

• How analyze logs?• Custom scripts

• Parse job history files, hadoop logs

• Data warehouse• Visualization

• Not much by the way of publicly available tools

Page 32: Managing growth in Production Hadoop Deployments

32

SAMPLE PLOT: CONTAINER WAIT TIME AND UTILIZATION PER QUEUE

Container wait times

Queue utilization

vCore usage

Page 33: Managing growth in Production Hadoop Deployments

33

SAMPLE PLOT: DAILY JOB TYPE AND STATUS

Page 34: Managing growth in Production Hadoop Deployments

34

SAMPLE PLOT: DAILY JOB BREAKDOWNBY USER

Page 35: Managing growth in Production Hadoop Deployments

35

QUEUE TUNING STRATEGY

• Determine how you want your cluster to behave• Pick scheduler depending on behavior

• Real world examples:• Production jobs must get resources

• Dedicate a certain portion of the cluster regardless of cluster state (idle, at capacity)

• Data loading jobs• Constrain to a small portion of cluster to preserve network bandwidth

• Research jobs:• Small portion of cluster at peak

• Large portion of cluster when idle

• Divide up cluster amongst business units

Page 36: Managing growth in Production Hadoop Deployments

36

QUEUE TUNING - SCHEDULER BASICS

Fair Scheduler Capacity Scheduler

Resource Allocation Get approx. equal share of resources over time

Queues are allocated a fraction of the cluster

Which Resource? Memory, CPU (optional) Memory, CPU (optional)

Inter-queue Constraints Max & Min Shares: X Mbs, Y vcores

Min capacity => guaranteed fraction of entire cluster when busyMax capacity => guaranteed fraction of entire cluster when idle

Intra-queue Resource Sharing Pick policy: FIFO, FAIR, Dominant Resource Fairness

Tunable policy: - Many users: 1st x users get 1/x- Many jobs, single user: FIFO

Page 37: Managing growth in Production Hadoop Deployments

37

MORE ON EACH SCHEDULER

• Fair Scheduler: • Hadoop Summit 2009• Job Scheduling With the Fair and Capacity Schedulers - Matei Zaharia

• Capacity Scheduler: • Hadoop Summit 2015 (5/9, 12:05pm)• Towards SLA-based Scheduling on YARN - Sumeet Singh, Nathan Roberts

Page 38: Managing growth in Production Hadoop Deployments

38

TOO MANY JOBS - MIXED PROFILE JOBS

• Jobs may have different memory profiles• Standard MR jobs: small container sizes• Newer execution frameworks (Spark, H2O):

• Large container sizes

• All or nothing scheduling

• A job with many little tasks• Can starve jobs that require large containers

Page 39: Managing growth in Production Hadoop Deployments

39

TOO MANY JOBS - MIXED PROFILE JOBS MITIGATION

• Reduce container sizes if possible• Always start with the lowest container sizes

• Node labels (YARN-2492) and gang scheduling (YARN-624)

• More details:• Running Spark and MapReduce Together In Production - David Chaiken

• Hadoop Summit 2015, 06/09, 2:35pm

Page 40: Managing growth in Production Hadoop Deployments

40

TOO MANY JOBS - HARDENING YOUR CLUSTER

• Cluster configuration audit• Container vs heap size

• Appropriate kernel level configuration

• Turn on Linux Container Executor

• Enable Hadoop Security

• Use operating system cgroups• Protect Hadoop daemons• Cage user processes:

• Impala

• Limits on what Hadoop can control:• CPU

• But not memory, network & disk BW

mapreduce.map.memory.mb = 1536mapreduce.map.java.opts = -Xmx2560m

Page 41: Managing growth in Production Hadoop Deployments

41

FAILURE CATEGORY 3 - TOO MANY USERS

Data access control YARN

network

h/w

OS

disk

DN NN

HDFS

NNdisk

NM NMcpu cpu

RAM RAM

RM

MRSpark Tez

HiveSparkSQ

LCascadin

gPig

Application

YARN

Inter-departmental resource contention

(too many jobs)

Page 42: Managing growth in Production Hadoop Deployments

42

TOO MANY USERS - QUEUE ACCESS

● Use queue ACLso restrict which users can submit jobs to a queueo per queue administrator roles:

submit job administer job

o restrict whether users can view applications in another queue

Page 43: Managing growth in Production Hadoop Deployments

43

DATA ACCESS CONTROL

• By default, Hadoop supports UNIX style file permissions

• Easy to circumvent

HADOOP_USER_NAME=hdfs hdfs dfs -rm /priv/data

• Use Kerberos

Page 44: Managing growth in Production Hadoop Deployments

44

DATA ACCESS CONTROL - ACCOUNTABILITY

• HDFS Audit logs• Produced by NameNode

015-02-24 20:59:45,382 INFO FSNamesystem.audit: allowed=true ugi=soam (auth:SIMPLE) ip=/10.251.255.181 cmd=delete src=/hive/what_a_con.db dst=/user/soam/.Trash/Current/hive/what_a_con.db perm=soam:hiveusers:rwxrwxr-x

“Who deleted that file?”

Page 45: Managing growth in Production Hadoop Deployments

45

SQUEEZE MORE CAPACITY FROM CLUSTER

Application Layer

Execution Framework

Core Hadoop Layer

YARN

network

h/w

OS

disk

DN NN

HDFS

NNdisk

NM NMcpu cpu

RAM RAM

RM

MRSpark Tez

HiveSparkSQ

LCascadin

gPig

Application

YARN

• Targeted upgrades, optimizations

Page 46: Managing growth in Production Hadoop Deployments

46

SQUEEZE MORE CAPACITY FROM CLUSTER

• Optimizations:• Application layer:

• Query optimizations, algorithmic level optimizations

• Upgrading:• Execution Framework:

• Tremendous performance improvements in Hive/Tez, Spark over the past two years

• Pig, Cascading all continue to improve

• Hadoop layer:• Recent focus on security, stability

• Recommendation:• Focus on upgrading execution framework

Page 47: Managing growth in Production Hadoop Deployments

47

QUESTIONS? COMMENTS?