jan 2013 hug: dist cpv2 for hug 20130116

19
DistCp (v.2) and the Dynamic InputFormat Mithun RK ([email protected]) 20130116

Upload: yahoo-developer-network

Post on 07-Nov-2014

1.069 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Jan 2013 HUG: Dist cpv2 for hug 20130116

DistCp (v.2) and the Dynamic InputFormatMithun RK

([email protected])

20130116

Page 2: Jan 2013 HUG: Dist cpv2 for hug 20130116

Me

Yahoo: HCatalog, Hive, GDM Firmware Engineer at Hewlett Packard Fluent Hindi Gold medal at the nationals, last year.

04/08/20232Yahoo! Presentation, Confidential

Page 3: Jan 2013 HUG: Dist cpv2 for hug 20130116

Prelude

Page 4: Jan 2013 HUG: Dist cpv2 for hug 20130116

“Legacy” DistCp

Inter-cluster file copy using Map/Reduce Command-line:

hadoop distcp –m 20 \

hftp://source_nn:50070/datasets/search/20120523/US \

hftp://source_nn:50070/datasets/search/20120523/UK \

hdfs://target_nn:8020/home/mithunr/target

Algo1. for (FileStatus f : FileSystem.globStatus(sourcePath)) { recurse(f); }

2. ~/_distCp_WIP_201301161600/file.list

3. InputSplit calculation:

1. Divide paths into ‘m’ groups (“splits”), one per mapper

2. Total file size in each split roughly equal to others.

4. Launch MR job

1. Each Map task copies files specified in its InputSplit

Source-path› http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0/src/tools/org/apache/

hadoop/tools/DistCp.java

04/08/20234Yahoo! Presentation, Confidential

Page 5: Jan 2013 HUG: Dist cpv2 for hug 20130116

The (Unfortunate) Beginning

Page 6: Jan 2013 HUG: Dist cpv2 for hug 20130116

Data Management on Y! Clusters

Grid Data Management (GDM)› Life-cycle management for data on Hadoop Clusters› 1+ Petabytes per day.

Facets:1. Acquisition: Warehouse -> Cluster

2. Replication: Cluster -> Cluster

3. “Retention”: Eviction of old data

4. Workflow tracking, Metrics, Monitoring, User-dashboards, Configuration management, etc.

GDM Replication:1. Use DistCp!

2. Don’t re-implement MR-job for file-copy.

04/08/20236Yahoo! Presentation, Confidential

Page 7: Jan 2013 HUG: Dist cpv2 for hug 20130116

A marriage doomed to fail

Poor programmatic use:› DistCp.main(“-m”, “20”, “hftp://source_nn1:50070/source”,

“hdfs://target_nn1:8020/target”);› Blocking call› Equal-size Copy-distribution: Can’t be overridden.

Long Setup-times:› Optimization: file.list contains only files that are changed/absent on target› Compare checksums› E.g. Experiment with 200 GB dataset: 14 minutes setup time.

Atomic commit:› Example: Oozie workflow-launch on data availability› Premature consumption

› Workarounds: _DONE_ markers:• Name-node pressure

• Hacks to ignore at source

Others

04/08/20237Yahoo! Presentation, Confidential

Page 8: Jan 2013 HUG: Dist cpv2 for hug 20130116

04/08/20238Yahoo! Presentation, Confidential

Page 9: Jan 2013 HUG: Dist cpv2 for hug 20130116

04/08/20239

Page 10: Jan 2013 HUG: Dist cpv2 for hug 20130116

DistCp Redux(Available in Hadoop 0.23/2.0)

Page 11: Jan 2013 HUG: Dist cpv2 for hug 20130116

Changes

(Almost) Identical command-line: hadoop distcp –m 20 \

hftp://source_nn:50070/datasets/search/20120523/US/ \

hftp://source_nn:50070/datasets/search/20120523/UK \

hdfs://target_nn:8020/home/mithunr/target/

Reduced setup-times:› Postpone everything to MR job

› E.g. Experiment with 200 GB dataset:

• Old: 14 minutes setup time

• New: 7 seconds

Improved Copy-times:› Large dataset copy test: Time cut down from 17 hours to 7 hours.

Atomic commit:hadoop distcp –atomic –tmp /home/mithunr/tmp /source /target

Improved programmatic use:› options = new DistCpOptions(srcPaths, destPath).preserve(BLOCKSIZE).setBlocking(false);

› Job job = new DistCp(hadoopConf, options).execute();

Others› Bandwidth throttling, Asynchronous mode, Configurable copy-strategies.

04/08/202311Yahoo! Presentation, Confidential

Page 12: Jan 2013 HUG: Dist cpv2 for hug 20130116
Page 13: Jan 2013 HUG: Dist cpv2 for hug 20130116

Cost of copy

Copy-time is directly proportional to file-size› (All else being equal)

Long-tailed MR jobs› Copy twenty 2GB files between clusters. Why does one take longer than the rest?

› Hint: Sometimes, a file is slow initially, and then speeds up after a “block boundary”.

Are data-nodes equivalent?› Slower hard-drives

› Failing NICs

› Misconfiguration

Take a closer look at the command-line:hadoop distcp –m 20 \

hftp://source_nn:50070/datasets/search/20120523/US/ \

hftp://source_nn:50070/datasets/search/20120523/UK \

hdfs://target_nn:8020/home/mithunr/target/

04/08/202313Yahoo! Presentation, Confidential

Page 14: Jan 2013 HUG: Dist cpv2 for hug 20130116

Reads over hdfs://

04/08/202314Yahoo! Presentation, Confidential

Data Node

Data Node

Data Node

Data Node

DFS Client

User Program

Data Node

Page 15: Jan 2013 HUG: Dist cpv2 for hug 20130116

Reads over hftp://

04/08/202315Yahoo! Presentation, Confidential

Data Node

Data Node

Data Node

Data Node

DFS Client

User Program

Data Node

Page 16: Jan 2013 HUG: Dist cpv2 for hug 20130116

Long-tails

04/08/202316Yahoo! Presentation, Confidential

/datasets/search/US/20130101/part-00000.gz/datasets/search/US/20130101/part-00001.gz/datasets/search/US/20130101/part-00002.gz/datasets/search/US/20130101/part-00003.gz/datasets/search/US/20130101/part-00004.gz/datasets/search/US/20130101/part-00005.gz/datasets/search/US/20130101/part-00006.gz/datasets/search/US/20130101/part-00007.gz/datasets/search/US/20130101/part-00008.gz/datasets/search/US/20130101/part-00009.gz...

Input Split #1

STUCK!

SPLIT!

Page 17: Jan 2013 HUG: Dist cpv2 for hug 20130116

Mitigation

Break static binding between InputSplits and Mappers

E.g. Consider DistCp of N files with 10 mappers:

1. Don’t create 10 InputSplits. Create 20 instead.

2. Store each InputSplit as a separate file.

1. hdfs://home/mithunr/_distcp_20130116/work-pool/

3. Mapper consumes one InputSplit and checks for more.

4. Mappers quit when no more InputSplits are left.

Single file per InputSplit?› NameNode pressure.

DynamicInputFormat› Separate Library

Perf:› Worst-case is no worse than UniformSizeInputFormat

› Best-case: 17 hours -> 7 hours.

04/08/202317Yahoo! Presentation, Confidential

Page 18: Jan 2013 HUG: Dist cpv2 for hug 20130116

Future

Block-level parallelism› Stream blocks individually

› Stitch at the end: Metadata

Yarn› Master-worker paradigm

04/08/202318Yahoo! Presentation, Confidential

Page 19: Jan 2013 HUG: Dist cpv2 for hug 20130116

_DONE_