4 supporting h base jeff, jon, kathleen - cloudera - final 2

52
Supporting HBase: How to Stabilize, Diagnose and Repair Jeff Bean, Jonathan Hsieh, Kathleen Ting {jwfbean,jon,kathleen}@cloudera.com 5/22/12 HBaseCon 2012. 5/22/12 Copyright 2012 Cloudera Inc. All rights reserved

Upload: cloudera-inc

Post on 10-May-2015

3.328 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

Supporting HBase: How to Stabilize, Diagnose and Repair

Jeff Bean, Jonathan Hsieh, Kathleen Ting{jwfbean,jon,kathleen}@cloudera.com

5/22/12

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Page 2: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

2

Who Are We?

• Jeff Bean• Designated Support Engineer, Cloudera• Education Program Lead, Cloudera

• Kathleen Ting• Support Manager, Cloudera• ZooKeeper Subject Matter Expert

• Jonathan Hsieh• Software Engineer, Cloudera• Apache HBase Committer and PMC member

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Page 3: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

3

Outline

• Preventative HBase Medicine: • Tips for a healthy HBase

• The HBase Triage:• Fixes for acute HBase pains

• The HBase Surgery:• Repairing a Corrupted HBase

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Page 4: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

4

Outline

• Preventative HBase Medicine: • Tips for a healthy HBase

• The HBase Triage:• Fixes for acute HBase pains

• The HBase Surgery:• Repairing a Corrupted HBase

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Page 5: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

5

“Monitor your system, exercise your workload, and eat your vegetables.”

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Page 6: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

6HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Page 7: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

7HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Page 8: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

8

HBase Cross-Section

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Disk / Network

JVM / Linux

ZooKeeper HDFS

HBase

App MR

Page 9: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

9

Doctor’s Advice: “A ounce of prevention worth a pound of cure.”

• Understand your workload and test for it

• Size your cluster properly (see Cluster Sizer)

• Monitor, alert, and manage your cluster with Ganglia, Nagios, and/or Cloudera Manager• Don’t be Dr. House!

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Page 10: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

10Copyright 2012 Cloudera Inc. All rights reserved

A Case Study

Page 11: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

11

Symptom: Long Running MapReduce job with blacklisted TaskTrackers

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

TaskTracker No. of Failures

NodeX 4

NodeY 3

NodeQ 7

NodeB 10

NodeP 8

NodeV 6

Disk / Network

JVM / Linux

ZK HDFS

HBase

App MR

Page 12: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

12

Symptom: Node B Task Logs

$ find . | xargs grep "giving up“./attempt_201107261334_0221_m_000962_1/syslog:2011-08-02

11:09:34,248 INFO org.apache.hadoop.ipc.HbaseRPC: Server at NodeA:60020 could not be reached after 1 tries, giving up.

./attempt_201107261334_0221_m_000962_1/syslog:2011-08-02 11:09:37,328 INFO org.apache.hadoop.ipc.HbaseRPC: Server at NodeA:60020 could not be reached after 1 tries, giving up.

./attempt_201107261334_0221_m_000962_1/syslog:2011-08-02 11:09:40,465 INFO org.apache.hadoop.ipc.HbaseRPC: Server at NodeA:60020 could not be reached after 1 tries, giving up.

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Disk / Network

JVM / Linux

ZK HDFS

HBase

App MR

Page 13: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

13

Symptom: RegionServer logs of Node A:

2011-08-02 11:04:20,324 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer

: ABORTING region server serverName=NodeA,60020,1312228900706, load=(requests=10847, regions=342, usedHeap=8193, maxHeap=15350): regions

erver:60020-0x4316487a73e1626 regionserver:60020-0x4316487a73e1626 received expired from ZooKeeper, aborting

org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Disk / Network

JVM / Linux

ZK HDFS

HBase

App MR

Page 14: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

14

Cascading failure! Some other node says ouch…2011-08-01 12:55:39,356 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook:

Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=Thread[Thread-15,5,main]

2011-08-01 12:55:39,629 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Shutdown hook

2011-08-01 12:55:39,629 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown hook thread.

2011-08-01 12:55:39,695 ERROR org.apache.hadoop.hdfs.DFSClient: Exception closing file /hbase/.logsNodeA,60020,1311651881177NodeA%3A60020.1311656326143 : java.io.IOException: Error Recovery for block blk_1102151039331207284_16350929 failed because recovery from primary datanode NodeA:50010 failed 6 times. Pipeline was NodeA:50010. Aborting...

java.io.IOException: Error Recovery for block blk_1102151039331207284_16350929 failed because recovery from primary datanode NodeA:50010 failed 6 times. Pipeline was NodeA:50010. Aborting...

at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2841)

at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2305)

at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2477)

2011-08-01 12:55:39,842 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook finished.

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Disk / Network

JVM / Linux

ZK HDFS

HBase

App MR

Page 15: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

15

Symptom: Ganglia Memory Graph on Node A…

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Disk / Network

JVM / Linux

ZK HDFS

HBase

App MR

Page 16: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

16

Symptom: Ganglia swap_free on Node A…

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Disk / Network

JVM / Linux

ZK HDFS

HBase

App MR

Page 17: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

17

A Case study: Radiant Pain“I was having back pains, and it turned out to be my heart!”

• Too many MR Slots• MR Slots too large• Too many non-HBase

small files (HDFS-2379)

Node A Under Load

• “Arbitrary” processes pause or unresponsive

Node A swaps• MapReduce tasks fail• HDFS datanode

operations time out• HBase client operations

fail

Node B can’t connect to node A

• JobTracker blacklists TT on node B

• Jobs fail or run slow• NameNode re-replicates

blocks from node A

Masters Take Action

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Page 18: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

18

Event Trail and Evidence Trail

Node A condition

(load)

Node A event (swap)

Node B symptom (connect)

Master Action

(blacklist)

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Node A Monitoring

Transient swap not logged!

Node B Logs

Master Logs and

UIs

!!!?!?

Page 19: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

19

DOs and DON’Ts for keeping HBase Healthy

DOs• Monitor and Alert• Optimize network• Know your logs

DON’Ts• Swap• Oversubscribe MR• Share the network

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Page 20: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

20

Outline

• Preventative HBase Medicine: • Tips for a healthy HBase

• The HBase Triage:• Fixes for acute HBase pains

• The HBase Surgery:• Repairing a Corrupted HBase

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Page 21: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

“Cloudera 911 here, how can we help?”

Page 22: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

22

HBase Support Tickets

44%

12%

16%

28%

HBase, ZK, MR, HDFS MisconfigPatch RequiredFix HW/NWRepair Needed

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Page 23: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

23

Understanding the logs helps us diagnose issues

• Related events logged by different processes in different places• Log messages point at each other• HDFS accesses by RS logged by NN and DN• HBase accesses by MR logged by JT, RS, NN, ZK• ZK logs indicate HBase health

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Page 24: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

24Copyright 2012 Cloudera Inc. All rights reserved

The HBase Triage: Fixes for acute HBase pains

• Severe Pain

• Complete Unconsciousness

Page 25: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

25Copyright 2012 Cloudera Inc. All rights reserved

The HBase Triage: Fixes for acute HBase pains

• Severe Pain

• Complete Unconsciousness

Page 26: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

26

Connection Reset

WARN - Session <id> for server <server id>, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Connection reset by peer

What causes this?• Running out of ZK connections

How can it be resolved?• Manually close connections• Fixed in HBASE-5466 and HBASE-4773

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Disk / Network

JVM / Linux

ZK HDFS

HBase

App MR

Page 27: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

27

Running out of DN Threads & File Descriptors

INFO hdfs.DFSClient: Could not obtain block <blk id> from any node: java.io.IOException: No live nodes contain current block. ERROR java.io.IOException: Too many open files

What causes this? • HBase likes to keep data files open

How can it be resolved? • Increase dfs.datanode.max.xcievers to 4096• Increase /etc/security/limits.conf

• hbase - nofile 32768 HBaseCon 2012. 5/22/12

Copyright 2012 Cloudera Inc. All rights reserved

Disk / Network

JVM / Linux

ZK HDFS

HBase

App MR

Page 28: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

28

“Long Garbage Collecting Pause”

WARN org.apache.hadoop.hbase.util.Sleeper: We slept 19118ms instead of 1000ms, this is likely due to a long garbage collecting pause and it's usually bad

How can it be resolved?• zoo.cfg: maxSessionTimeout=180000

hbase-site.xml: zookeeper.session.timeout=180000

• Oversubscribed if MR & HBase are co-located

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Disk / Network

JVM / Linux

ZK HDFS

HBase

App MR

Page 29: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

29

Heap Allocation Per Node

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

(Map + Red) x Child Heap +

DN heap +

TT heap +

RS heap +

OS (20% of RAM)

Total RAM

Page 30: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

30Copyright 2012 Cloudera Inc. All rights reserved

The HBase Triage: Fixes for acute HBase pains

• Severe Pain

• Complete Unconsciousness

Page 31: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

31

ZK can’t start & HBase hangs

INFO org.apache.hadoop.hdfs.DFSClient: Could not complete file <name> retrying…

What causes this? • High dfs.replication.min causes HBase hang -

can’t close file until created all replicasHow can it be resolved? • Remove dfs.replication.min• Temp increase dfs.balance.bandwidthPerSec• Fixed in HDFS-2936

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Disk / Network

JVM / Linux

ZK HDFS

HBase

App MR

Page 32: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

32

Unable to Load Database

FATAL org.apache.zookeeper.server.quorum.QuorumPeer: Unable to load database on disk

What causes this? • ZK data directories filled up

How can it be resolved? • Wipe out /var/zookeeper/version-2 • Run zkCleanup.sh script via cron

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Disk / Network

JVM / Linux

ZK HDFS

HBase

App MR

Page 33: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

33

Downed HBase Master and RegionServers

WARN org.apache.zookeeper.server.quorum.Learner: Exception when following the leader java.net.SocketTimeoutException: Read timed out

What causes this? • Session Timeout + Session Expiration = NW Prob

How can it be resolved?• Monitor network (e.g. ifconfig)• Run ≥ 3 ZK servers (majority rules)

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Disk / Network

JVM / Linux

ZK HDFS

HBase

App MR

Page 34: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

34Copyright 2012 Cloudera Inc. All rights reserved

The HBase Triage: Fixes for acute HBase pains

• Severe Pain

• Complete Unconsciousness

Page 35: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

35

Outline

• Preventative HBase Medicine: • Tips for a healthy HBase

• The HBase Triage:• Fixes for acute HBase pains

• The HBase Surgery:• Repairing a Corrupted HBase

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Page 36: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

36

“To the operating room, please”

• Hbase refuses to start• Hbase’s HBCK reports inconsistencies

Page 37: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

37

HBase Support Tickets

44%

12%

16%

28%

HBase, ZK, MR, HDFS MisconfigPatch RequiredFix HW/NWRepair Needed

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Page 38: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

38

HBase Support Tickets

44%

12%

16%

28%

HBase, ZK, MR, HDFS MisconfigPatch RequiredFix HW/NWRepair Needed

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Page 39: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

39Copyright 2012 Cloudera Inc. All rights reserved

Detecting internal problems with hbck

• HBase since 0.90 has included a tool for scanning an HBase instance’s internals to find corruptions.

hbase hbck

hbase hbck -details

Page 40: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

40

Tables are sharded into regions

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

0000000000

1111111111

2222222222

3333333333

4444444444

5555555555

6666666666

7777777777

0000000000

1111111111

2222222222

3333333333

4444444444

5555555555

6666666666

7777777777

[‘’, A)

[A, B)

[B, ‘’)

Invariants: Maintain table integrity and region consistency !

Page 41: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

41

Table Integrity Invariants

• Every key shall get assigned to a single region.

• Table Regions shall:• Cover the entire range of possible

keys,• from the absolute start (‘’) • to the absolute end (unfortunately,

also ‘’).

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

[‘ ‘,A)[A,B)

[B, C)

[C, D)

[D, E)

[E, F)

[F, G)

[G, ‘ ‘)

Page 42: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

42

Region Consistency Invariants

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

.regioninfo in HDFS

Assigned onRegion server

info:regioninfo in META

RegionConsistent

Orphans

Page 43: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

43Copyright 2012 Cloudera Inc. All rights reserved

Repairing internal problems with hbck

• Newer and upcoming versions of HBase include an hbck that can fix internal problem as well as detect.• 0.90.7 • 0.92.2• 0.94.0• CDH3u4+ • CDH4b2+

Look’s like you’ve broken an invariant

Page 44: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

44

Bad region assignment

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

.regioninfo in HDFS

Assigned onRegion server

RegionConsistent

info:regioninfo in META

.regioninfo in HDFS

hbck -fix (0.90.x)hbck –fixAssignments (0.90.7+, 0.92.2+, 0.94+)

Orphans

Page 45: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

45

Region not in META

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

.regioninfo in HDFS

Assigned onRegion server

info:regioninfo in META

RegionConsistent

.regioninfo in HDFS

Orphans

hbck –fixAssignments -fixMeta

Page 46: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

46

Assigned onRegion server

Regioninfo not in HDFS

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

.regioninfo in HDFS

Assigned onRegion server

info:regioninfo in META

RegionConsistent

info:regioninfo in META

Assigned onRegion server

.regioninfo in HDFS

Orphans

hbck –fixAssignments -fixMeta

Page 47: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

47

Table Regions must not have holes

• Where to I put row key “CRUD”?• Where is region [C,D)?

• Repair: • Find the orphan and adopt it.• Fabricate a new region to fill the

hole

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

[‘ ‘,A)[A,B)

[B, C)

[D, E)

[E, F)

[F, G)

[G, ‘ ‘)

?

# NOTE! HBase should be idle (no get/put/split/compacts)hbck –fixHdfsHoles –fixHdfsOrphans –fixAssignments -fixMeta

Page 48: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

48

Table Regions must not overlap

• Hm.. Which region should “BAD” go?

• Is it [B, D) or is it [B,C)?• Likely due to a bad split.• Repair:• Merge regions or,• Sideline and bulk load

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

[‘ ‘,A)[A,B)

[B, D) [B,C)

[C, D)

[D, E)

[E, F)

[F, G)

[G, ‘ ‘)

??

# NOTE! HBase should be idle (no get/put/split/compacts)hbck –fixHdfsOverlaps –fixAssignments -fixMeta

Page 49: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

49

Assigned onRegion server

Consistency problem summary

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

.regioninfo in HDFS

info:regioninfo in META

RegionConsistent

Orphans

hbck –fixAssignments –fixMeta –fixHdfsHoles –fixHdfsOrphans –fixHdfsOverlaps

Page 50: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

50

Investigating further

• HFile – examine contents of HFiles• Hlog – examine contents of HLog file• OfflineMetaRepair – Rebuild meta table from file

system.• Also, some scripts for manual repairs:

https://github.com/jmhsieh/hbase-repair-scripts

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Page 51: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

51

Outline

• Preventative HBase Medicine: • Tips for a healthy HBase

• The HBase Triage:• Fixes for acute HBase pains

• The HBase Surgery:• Repairing a Corrupted HBase

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved

Page 52: 4 supporting h base   jeff, jon, kathleen - cloudera - final 2

52

Questions?

HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved