Download - Hadoop Tuning Guide-Version5
-
8/11/2019 Hadoop Tuning Guide-Version5
1/22
R 1.0 O 2012
1
H P T G
-
8/11/2019 Hadoop Tuning Guide-Version5
2/22
R 1.0 O 2012
2
2012 , . .
T A MD , I . ( AMD ) . AMD
. T
. N , , , , . E
AMD S T C S , AMD , ,
, , , ,
.
AMD , , ,
, AMD
, , .AMD
.
A U R L (URL ) I .
AMD, AMD A , , AMD A , AMD O , 3DN !, AMD V AMD A M D , I .
L L T .SPEC, SPEC SPEC S P E C .O
.
-
8/11/2019 Hadoop Tuning Guide-Version5
3/22
R 1.0 O 2012
3
REVISION HISTORY ............................................................................................................................................... 4
1.0 INTRODUCTION ................................................................................................................................................ 5
1.1 INTENDED AUDIENCE................................................................................................................................ 5
1.2 CHALLENGES INVOLVED IN TUNING HADOOP .......................................................................................... 5 1.3 MONITORING AND PROFILING TOOLS ...................................................................................................... 6 1.4 METHODOLOGY AND EXPERIMENT SETUP ............................................................................................... 6
2.0 GETTING STARTED ........................................................................................................................................... 7
2.1 CORRECTNESS OF HARDWARE SETUP ...................................................................................................... 7 2.2 UPGRADING SOFTWARE COMPONENTS................................................................................................... 8 2.3 PERFORMING STRESS TESTS ..................................................................................................................... 8 2.4 ENSURING HADOOP JOB COMPLETION .................................................................................................... 9
2.4.1 OS PARAMETERS ...................................................................................................................... 9 2.4.2 HADOOP PARAMETERS ............................................................................................................ 9
3.0 PERFORMANCE TUNING ................................................................................................................................ 10
3.1 HADOOP CONFIGURATION TUNING ....................................................................................................... 10
3.1.1 BASELINE CONFIGURATION ................................................................................................... 10 3.1.2 DATA DISK SCALING ............................................................................................................... 11 3.1.3 COMPRESSION ....................................................................................................................... 11 3.1.4 JVM REUSE POLICY................................................................................................................. 12 3.1.5 HDFS BLOCK SIZE ................................................................................................................... 12 3.1.6 MAP SIDE SPILLS .................................................................................................................... 13 3.1.7 COPY/SHUFFLE PHASE TUNING ............................................................................................. 14 3.1.8 REDUCE SIDE SPILLS ............................................................................................................... 14 3.1.9 POTENTIAL LIMITATIONS ....................................................................................................... 15
3.2 JVM CONFIGURATION TUNING............................................................................................................... 16
3.2.1 JVM FLAGS ............................................................................................................................. 16 3.2.2 JVM GARBAGE COLLECTION .................................................................................................. 16
3.3 OS CONFIGURATION TUNING ................................................................................................................. 17
3.3.1 TRANSPARENT HUGE PAGES ................................................................................................. 17 3.3.2 FILE SYSTEM CHOICE AND ATTRIBUTES ................................................................................. 17 3.3.3 IO SCHEDULER CHOICE .......................................................................................................... 18
4.0 FINAL WORDS AND CONCLUSION .................................................................................................................. 19
5.0 RESOURCES .................................................................................................................................................... 20
6.0 REFERENCES ................................................................................................................................................... 21
-
8/11/2019 Hadoop Tuning Guide-Version5
4/22
R 1.0 O 2012
4
1.0 O 2012 S J I
-
8/11/2019 Hadoop Tuning Guide-Version5
5/22
R 1.0 O 2012
5
1.0
H 1 J M R . H M R
60.2% 2011 2016 2 . F , H
, , . UH H , H
. I , H . U
5.6X . W OS, JVM H .
S 1 I , H , , H
.
S 2
H . S , , S , OS H
H .
S 3 H , JVM, OS , H , T S
.
S 4 . S 5 S 6 .
1.1 T H
H . S A H
.
T H .
1.2
W H ?
H S . B
H . P H H , JVM, OS, , ,
BIOS . H . O
H . S . T ,
-
8/11/2019 Hadoop Tuning Guide-Version5
6/22
R 1.0 O 2012
6
. A , . O , H
.
1.3
W H ?
G N :G 3 N 4 CPU , ,
. T H : H
. L OS dstat , vmstat , iostat , netstat free
. T H .
H V 5 A H H
H . J P : H H ,AMD C A 6 O S
S : P A 7 J H H .
S L 8 OP 9 .
1.4
T H .
3 AMD OTM . S 10 11 12 . T
H H
T T S 13 1TB T G . N H
. T
.
T H :
4 (D N T T ), 1 (N N , S N N J T ): 2 /16 AMD O TM 6386 SE 2.8GH
16 8 GB DDR3 1600 MH ECC RAM 8 T MK2002TSKB 2TB @7200 SATA 1 LSI M RAID SAS 9265 8 RAID 1 1G E R H E L S 6.3 (S ) 2.6.32 279.5.2. 6. 86 64 O J (TM) SE R E ( 1.7.0 05 06) J H S (TM) 64 B S
( 23.1 03, ) C D H (CDH) 4.0.1 ( 2.0.0 1 4.0.1)
:// . . / / /MAPREDUCE 2374 .
-
8/11/2019 Hadoop Tuning Guide-Version5
7/22
R 1.0 O 2012
7
2.0
T H , 10 11 H ,
H . H H / S
H ?
B , H :
V . U . P / . T OS H H
N H N
. I .
2.1
A , BIOS, , OS DIMM , , .
T . T
H .
T BIOS. B BIOS
BIOS . I BIOS .
S RAID/ , , , DIMM . W
S 2.3. U . O IO
H . F .
S AMD O TM 6200 S P L T G 14
. S , DIMM , DIMM , NUMA
OS STREAM 15 . I STREAM O TM 6200 S P L T G 14 .
T , H
-
8/11/2019 Hadoop Tuning Guide-Version5
8/22
R 1.0 O 2012
8
F .
U , BIOS, .
P .
2.2 T L OS , L ,
, JDK H/ H H . T
H H .
P , L H . D
OS . T L
H
T H . A . I ,
ISA W H JVM
H .
O , L , , ,
. A , L S 16 LZO 17 /
.
T ,
U L / H .
U H .
U JVM H .
2.3
S H H . T /
. W H H . W
. F :
STREAM NUMA .
IO IO .
-
8/11/2019 Hadoop Tuning Guide-Version5
9/22
R 1.0 O 2012
9
N H DFSIO, NNB MRB
H . T H . O SPEC , SPEC SPEC
.
2.4 D H OS H
M /R / H . I T S ,
:
2.4.1
T (FD) ulimit FD H . T
. W 32768 I H
S net.core.somaxconn L . T 128. I ,
. W 1024 .
2.4.2
D H M /R mapred.task.timeout H
. . T 600 . I , . N
, / H . T
. I java.net.SocketTimeoutException
/ dfs.socket.timeout dfs.datanode.socket.write.timeout .
. A .
-
8/11/2019 Hadoop Tuning Guide-Version5
10/22
R 1.0 O 2012
10
3.0
O H . C
H H , JVM, OS H . I , T S
H . T
. W . T
H , JVM, OS. N H H S 1.3.
3.1
I H
.
A T S . I M T S 1TB IO 1TB IO
IO . T R IO 1TB IO , IO
R . O , T S IO . D H
.
3.1.1
T M /R J H. T . T CPU
T J H , M /R , / , IO
. A :
A. S
B. C M R M CPU R
C. C J M R JVM
M /R .
T H mapred.map.tasks ,mapred.tasktracker.map.tasks.maximum, mapred.reduce.tasks,mapred.tasktracker.reduce.tasks.maximum, mapred.map.child.java.opts , mapred.reduce.child.java.opts . . U ,
, :
A. U 4 D N
-
8/11/2019 Hadoop Tuning Guide-Version5
11/22
R 1.0 O 2012
11
B. A 2 M 1 R C. A 1GB J M R JVM .
T 3 JVM 3GB . W 4GB RAM . T 1GB OS . U
. N T S . B H .
3.1.2
T . O mapred.local.dir .
dfs.name.dir dfs.data.dir . H. G IO T S
. F 1 T .
Figure 1: TeraSort performance scaling with number of data disks
3.1.3
H 3 , M R. I
. S T S R
. T M . E IO CPU
. T , / H . T
mapred.compress.map.output, mapred.map.output.compression.codec,mapred.output.compress, mapred.output.compression.type,
100.00 %
77.40 %
53.61 %
0.00
20.00
40.00
60.00
80.00
100.00
120.00
4
5
7
-
8/11/2019 Hadoop Tuning Guide-Version5
12/22
R 1.0 O 2012
12
mapred.output.compression.codec . . T CPU CPU
IO /
T F 2 M T S .
Figure 2: Effect of Map output compression using different codecs on TeraSort performance
W 28% S . T S F . S (1.5%) LZO
.
3.1.4 H mapred.job.reuse.jvm.num.tasks
M /R JVM 1 . T . 1 JVM
. S 1 JVM . E JVM JVM
JVM J JIT . JVM
. W 2% JVM .
3.1.5
E M . Tmapred.min.split.size ( . ),dfs.block.size ( . ) mapred.max.split.size ( . )
. T M H . F
T S HDFS dfs.block.size . I H M
HDFS . R M
100.00 %
64.14 % 64.16%
0.00
20.00
40.00
60.00
80.00
100.00
120.00
N C
S C
LZO C
-
8/11/2019 Hadoop Tuning Guide-Version5
13/22
R 1.0 O 2012
13
M JVM . I R . L M . I
M M . NM HDFS
. F 3 . W 256M .
Figure 3: TeraSort performance comparison with different HDFS block sizes
3.1.6
W M . T
M JVM . T 100 MB. T io.sort.mb ( . ) . A . B 0.05 (5%) io.sort.mb
5MB. T io.sort.record.percent . . E 16 . T 327680
C . T
io.sort.spill.percent . 0.8 (80%) .
S M ( ) . T M 304 . I J M
. I
. . .. A M M
S R J T M. I M
A M
. Tio.sort.mb io.sort.spill.percent 0.99
100.00 %
87.23 %80.37 % 82.06 %
0.00
20.00
40.00
60.00
80.00
100.00
120.00
64MB
128MB
256MB
384MB
-
8/11/2019 Hadoop Tuning Guide-Version5
14/22
R 1.0 O 2012
14
M N
M . F , HDFS 256MB 100
io.sort.mb 316MB, io.sort.record.percent 0.162 (16.2% 316MB) io.sort.spill.percent 0.99 (99% ) M . W
2.64% M
3.1.7 /
I R M . T
:
T mapred.reduce.parallel.copies . 5 . T
. T T T
tasktracker.http.threads 40 . T T T . O .
C dfs.datanode.handler.count ( . ),dfs.namenode.handler.count ( . ) mapred.job.tracker.handler.count ( . )
. R . T
R . I R .
N . U
.
3.1.8
T R H . R IO H
. A , M / , HDFS. T
R H , J R JVM M JVM .
O M , M R T T . T M
T T . A , mapred.job.shuffle.input.buffer.percent
. , , M . O , M . T mapred.job.shuffle.input.buffer.percent 0.70 . T
70% R JVM M . W ( mapred.job.shuffle.merge.percent
. 0.66 ) M O R R JVM mapred.job.reduce.input.buffer.percent . M
R .
-
8/11/2019 Hadoop Tuning Guide-Version5
15/22
R 1.0 O 2012
15
mapred.job.reduce.input.buffer.percent 0.0 R JVM .
E , . . . . . mapred.job.reduce.input.buffer.percent IO
R . I R T S
J . I mapred.job.reduce.input.buffer.percent 1.0
W . . . . . mapred.job.reduce.input.buffer.percent 0.8. I R T S
. A 2 M . 1 M M J
R JVM . L R .
W 10% R . F 4 .
Figure 4: Effect of tuning Reduce phase Hadoop parameters on TeraSort
3.1.9
F , H / . F , CDH 4.0.1
M /R . T . F , R
R R . A , mapred.max.tracker.failures
. T T . T T T 4 T T
. T mapred.max.tracker .failures
H . C mapred.max.tracker.failures . O ,
100.00 %90.16 %
0.00
20.00
40.00
60.00
80.00
100.00
120.00
W R
W R
-
8/11/2019 Hadoop Tuning Guide-Version5
16/22
R 1.0 O 2012
16
H . N .
T .
3.2 O H
JVM.
3.2.1
JVM JVM . T
JVM . F :
A O T JVM . T JVM
3.4% . N
A O O JDK7 U 5. U C O C O O P
64 JVM J O JVM . I JVM
, . W 1% T S . N U C O O JDK 7 U 5.
U B L B O H S JDK . W 1% T S
. N U B L O JDK 7 U
3.2.2
G M R JVM JVM . H , GC
GC . O
GC M R JVM . R GC .
F 5 JVM .
-
8/11/2019 Hadoop Tuning Guide-Version5
17/22
R 1.0 O 2012
17
Figure 5: Effect of JVM command-line options tuning on TeraSort
3.3
I OS H
3.3.1
T (THP) RHEL 6.2 . H ,
THP CPU
THP . T . W 66% T S THP . THP H . S K I W A CDH4
4.0.1 19 . A THP .
3.3.2
T (FS) L . G FS
FS H IO 6.3 EXT4 FS
EXT3 FS.
I , . T (
FS . W 29% T FS .
100.00 % 96.74 %
0.00
20.00
40.00
60.00
80.00
100.00
120.00
W JVM
W JVM
-
8/11/2019 Hadoop Tuning Guide-Version5
18/22
R 1.0 O 2012
18
3.3.3
M L 4 IO CFQ, , D IO , IO
IO . T IO L . F U 11.04 IO
RHEL 6.3 CFQ . A :// . . / . ? =2188323 15% CFQ
.
C F 6 OS
Figure 6: Effect of OS configuration tuning on TeraSort performance
100.00 %
77.39 %
0.00
20.00
40.00
60.00
80.00
100.00
120.00
W OS
W OS
-
8/11/2019 Hadoop Tuning Guide-Version5
19/22
R 1.0 O 2012
19
4.0
B D H . T H . T
H H
. U H
H .
I H . O T S 5.6X
(4 F 1) . W 3X ( F 1) .
F 7 :
Figure 7: Total improvements in TeraSort performance through configuration tuning
S H .
5.60 X
3.00 X
1.00 X
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
B 4
B 7
T
-
8/11/2019 Hadoop Tuning Guide-Version5
20/22
R 1.0 O 2012
20
5.0
AMD D C J Z ::// . . /R / / /P / .
AMD D T : :// . . / AMD O TM 6200 S P L T G ::// . . / /2012/04/25/
%E2%80%9C %E2%80%9D / 5 5% H :
:// . . / /2011/10/20/5 5 / J G C C T G A H T S
:// . . /R / / /P / . A :
:// . . / . ? =2188323 A H P M B P
: :// . . / /2011/07/12/%E2%80%93 /
M R E T ::// . . / /2012/06/06/ %E2%80%93/
M R O M R E ::// . . / /2012/05/29/
/ O J A P S B P :
:// . . / /2012/04/25/%E2%80%9C %E2%80%9D /
-
8/11/2019 Hadoop Tuning Guide-Version5
21/22
R 1.0 O 2012
21
6.0
[1] "W A H ," O . A : :// . . /. A 07 O 201
[2] "IDC R F W H M R E S F , S G WC A T T D US23471212," O . A :
:// . . / . ? I = US23471212. A 07 O 2012 .
[3] "G M S ," O . A : :// . . /. A 07 O2012 .
[4] "N T I S IT I M ," O . A : :// .A 07 O 2012 .
[5] "V G ," O . A : :// . . / / / / . .A 07 O 2012 .
[6] "AMD C A P A AMD D C ," O . A ::// . . / / /C A / / . . A 07 O 2012 .
[7] "O S S 12.2: P A O S S 12.2: P AO . A : :// . . / /E18659 01/ /821 1379/ . . A 07 O
2012 .
[8] "M P P W ," O . A : :// . . . / . /M P . AO 2012 .
[9] "A OP ," O . A : :// . . / /. A 07 O
[10] T. W , H : T D G , S : O'R M , I ., 2010.
[11] "7 T I M R P A H E C ," OA : :// . . / /2009/12/7 /.
A 07 O 2012 .
[12] "D B : S H P 1: M I ," O . A ::// . . /2011/01/ 1 . . A 07 O
2012 .
[13] O. O'M . O . A : :// . /Y H . . A 07 O 201
-
8/11/2019 Hadoop Tuning Guide-Version5
22/22
R 1.0 O 2012
[14] "D G M AMD D C ," O . A ::// . . /A /51803A O L T G SCREEN. . A 07 O
2012 .
[15] "MEMORY BANDWIDTH: STREAM BENCHMARK PERFORMANCE RESULTS," O . A ::// . . . / /. A 09 O 2012 .
[16] "5 5% H AMD D C ," O . A:// . . / /2011/10/20/5 5 /.
A 09 O 2012 .
[17] " . : LZO ," O . A ::// . . / / /. A 09 O 2012 .
[18] " #MAPREDUCE 2374 "T F B " MR ASF JIRA," O . A ::// . . / / /MAPREDUCE 2374. A 09 O 2012 .
[19] "CDH4 R N C S ," O . A ::// . . / /CDH4DOC/CDH4+R +N . A 09 O 2012 .