starfish: a self-tuning system for big data analytics...presented by nirupam roy starfish: a...

26
Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, Shivnath Babu Department of Computer Science Duke University

Upload: others

Post on 09-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Presented by Nirupam Roy

Starfish: A Self-tuning System for Big Data Analytics

Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, Shivnath Babu

Department of Computer Science Duke University

Page 2: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

The Growth of Data

Page 3: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

MAD: Features of Ideal Analytics System

Magnetism

Agility

Depth

-- accept all data

-- allow complex analysis

-- adapt with data, real-time processing

Page 4: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Magnetism

Agility

Depth

-- accept all data

-- adapt with data, real-time processing

-- allow complex analysis

Hadoop is MAD

- Blindly loads data into HDFS.

- Fine-grained scheduler - End-to-end data pipeline - Dynamic node addition/ dropping

- Well integrated with programming languages

Page 5: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Tuning for Good Performance: Challenges

- Multiple dimensions of performance -- time, cost, scalability …

- Tons of Parameters -- more than 190 parameters in Hadoop.

- Multiple levels of abstraction -- job-level, workflow-level, workload-level …

Page 6: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Thumb rule

Tuning for Good Performance: Challenges

Page 7: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Thumb rule

Tuning for Good Performance: Challenges

Page 8: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Starfish: A Self-tuning System

- Builds on Hadoop - Tunes to ‘good’ performance automatically

Page 9: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Starfish Architecture

Page 10: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

The “What-if” Engine

Model + simulation based prediction algo.

Predicted performance

Learning from previous job

profiles

Analytical models to estimate

dataflow

Simulating the execution of MR

workload

Profile of a job (P)

+ New

parameter set (S)

[Ref:] A What-if Engine for Cost-based MapReduce Optimization. H. Herodotou et.al.

Page 11: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

The “What-if” Engine

Ground truth Estimated by the What-if engine

Page 12: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Starfish Architecture: Job Level

Page 13: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Starfish Architecture: Job Level

Just-in-time optimizer -- Searches the parameter space

Profiler -- Collects info. on MapReduce job execution through dynamic instrumentation -- Reports timings, data size, and resource utilization

Sampler -- Generates profile statistics from training benchmark jobs

Page 14: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Starfish Architecture: Workflow Level

Page 15: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Starfish Architecture: Workflow Level

Scheduler to balanced distribution of data

Block placement policy for data collocation

-- deals with skewed data, add/drop of nodes, tradeoff between balanced data v/s data-locality

-- Local-write v/s round-robin

Page 16: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Starfish Architecture: Workflow Level

Producer

Consumer

Wasted production

Page 17: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Starfish Architecture: Workflow Level

File level parallelism

Block level parallelism

Page 18: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Starfish Architecture: Workflow Level

What-if simulation

Workflow Aware Optimizer Select best data layout and job parameters

•  MR job execution •  Task scheduling •  Block placement

Compare cost & benefits

Running time?

Data layout?

Page 19: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Starfish Architecture: Workload Level

Page 20: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Starfish Architecture: Workload Level

Workload Optimizer

Elastisizer •  Determine best cluster and Hadoop configurations

•  Jumbo operator •  Cost based estimation for

best optimization

Page 21: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Starfish: Summary

- Optimizes on different granularities -- Workload, workflow, job (procedural & declarative)

- Considers different decision points -- Provisioning, optimization, Scheduling, Data layout

Page 22: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Starfish: Piazza Discussion

1) Limited evaluation: 10

Top criticisms (till 1:30pm, 17 reviews):

2) Not explained well: 7 3) Profiler overhead/better search algo: 5

* What is the effect of wrong prediction?

* What-if engine requires prior knowledge.

Page 23: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

http://www.cs.duke.edu/starfish/

Thank you.

Photo courtesy: Starfish group, Duke University

Page 24: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Going MAD with Big Data

Magnetic system

Agile system and Analytics

Deep Analytics

Data Life Cycle Awareness

Elasticity

Robustness

Page 25: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Backup: What-if Engine 1

Page 26: Starfish: A Self-tuning System for Big Data Analytics...Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko

Backup: What-if Engine 2