starfish-a self tuning system for bigdata analytics

19
STARFISH: A SELF-TUNING SYSTEM FOR BIGDATA ANALYTICS SEMINAR BY Y.SAI PRAMODA 10191A0511

Upload: sai-pramoda

Post on 18-Jul-2015

85 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Starfish-A self tuning system for bigdata analytics

STARFISH: A SELF-TUNING SYSTEM FOR BIGDATA ANALYTICS

SEMINAR BY

Y.SAI PRAMODA

10191A0511

Page 2: Starfish-A self tuning system for bigdata analytics

CONTENTS

• Introduction to Big data

• Hadoop

• Tuning problems

• Starfish Architecture

• Usage of Starfish

• Conclusion

Page 3: Starfish-A self tuning system for bigdata analytics

INTRODUCTION TO BIG DATA

Big data is the term for data sets so large and complicated that it becomes difficult to process using traditional data management tools or processing applications

What are the tools of Big data?

Features of Big data Analytics

Page 4: Starfish-A self tuning system for bigdata analytics

BIG DATA PRACTITIONERS

• Data analysts Report generation, data mining, ad optimization

• Computational scientists Computational biology, economics, journalism

• Statisticians and machine-learning researchers• Systems researchers, developers, and testers

Distributed systems, networking, security, …

Page 5: Starfish-A self tuning system for bigdata analytics

Practitioners want a MAD system-HADOOP

Hadoop is as MAD as it is!

Magnetism “Attracts” or welcomes all sources of data,

regardless of structure, values, etc.

Agility Adaptive, remains in sync with rapid data

evolution and modification

Depth More than just your typical analytics, we

need to support complex operations like statistical analysis and machine learning

Page 6: Starfish-A self tuning system for bigdata analytics

MADDER

Data-lifecycle Do more than just queries, Awareness optimize the movement,

storage, and processing of big

Elasticity Dynamically adjust resource usage

and user requirements

Robustness Provide storage and querying

services even in the

event of some failures

Page 7: Starfish-A self tuning system for bigdata analytics

Tuning Challenges

• Heavy use of programming languages for MapReduce programs

• Data loaded/accessed as opaque files

• Large space of tuning choices

• Elasticity is wonderful, but hard to achieve

• Terabyte-scale data cycles.

Page 8: Starfish-A self tuning system for bigdata analytics

Tuning Problems

Job-level

MapReduce

configuration

Workload

management

Data

layout

tuning

Cluster sizing

Workflow

optimization

J1 J2

J3

J4

Page 9: Starfish-A self tuning system for bigdata analytics

Starfish’s Core Approach to Tuning

Profiler

Collects concise

summaries of

execution

What-if Engine

Estimates impact of hypothetical changes

on execution

Optimizers

Search through space of tuning choices

Job

Workflow

Workload

Data layout

Cluster

Page 10: Starfish-A self tuning system for bigdata analytics

THE STARFISH PHILOSOPHY

• Goal: A high-performance MAD system

• Build on Hadoop’s strengths

• How can users get good performance automatically?

Page 11: Starfish-A self tuning system for bigdata analytics

STARFISH ARCHITECTURE

Page 12: Starfish-A self tuning system for bigdata analytics

VISUALIZE WITH STARFISH

• See how MapReduce apps are working

• Understand Bottlenecks in Hadoop

• Find Misconfigured Hadoop Parameters

• Learn to develop MapReduce apps

Page 13: Starfish-A self tuning system for bigdata analytics

OPTIMIZE WITH STARFISH

• Tune Hadoop easily

• Find Optimal parameters settings for MapReduce applications

Page 14: Starfish-A self tuning system for bigdata analytics

STRATEGIZE WITH STARFISH

• Make intelligent resource allocation choices for Hadoop.

• Find Instances for Workloads.

• Meet time and cost budgets with ease.

Page 15: Starfish-A self tuning system for bigdata analytics

STEPS TO USE STARFISH

Page 16: Starfish-A self tuning system for bigdata analytics

Cntd…

• First Step: collect the profiling the data from your Hadoop cluster.

• Second Step: import the profiling data into profile store.

• Third Step: Fire up the Graphical or Command Line interfaces to invoke visualize, optimize and strategize features.

Page 17: Starfish-A self tuning system for bigdata analytics

CONCLUSION

Hadoop is now a viable competitor to existing systems for big data analytics.

Starfish fills a different void by enabling Hadoop users and applications to get good performance automatically throughout the data lifecycle in analytics.

Page 18: Starfish-A self tuning system for bigdata analytics

REFERENCES

• Herodotou, Herodotos, et al. "Starfish: A self-tuning system for big data analytics." Proc. of the Fifth CIDR Conf. 2011.

• Dong, Fei. Extending Starfish to Support the Growing Hadoop Ecosystem. Diss. Duke University, 2012.

• Herodotou, Herodotos, Fei Dong, and Shivnath Babu. "MapReduce programming and cost-based optimization? Crossing this chasm with Starfish." Proceedings of the VLDB Endowment 4.12 (2011).

• http://www.cs.duke.edu/starfish/

• http://www.youtube.com/watch?v=Upxe2dzE1uk

Page 19: Starfish-A self tuning system for bigdata analytics