let's talk operations! (hadoop summit 2014)

13
Let’s Talk Operations! Allen Wittenauer

Upload: allen-wittenauer

Post on 26-Jan-2015

115 views

Category:

Technology


3 download

DESCRIPTION

These are the introductory slides I used (in some form or another) for the Let's Talk Operations! sessions for the 2014 Hadoop Summits. No video for this one!

TRANSCRIPT

Page 1: Let's Talk Operations! (Hadoop Summit 2014)

Let’s Talk Operations!Allen Wittenauer!

Page 2: Let's Talk Operations! (Hadoop Summit 2014)
Page 3: Let's Talk Operations! (Hadoop Summit 2014)

Twitter: @_a__w_ Email: aw @ apache.org!

Page 4: Let's Talk Operations! (Hadoop Summit 2014)

How many individual grids should I have?

Page 5: Let's Talk Operations! (Hadoop Summit 2014)

One big grid

Grid per project

• Pros!• Lower ops overhead!• One location for all data!

• Cons !• Dev and Prod on one

system

• Pros!• Capacity planning per project!

• Cons !• More headcount to maintain!• Multiple copies of data!• Data ingress is a mess

Page 6: Let's Talk Operations! (Hadoop Summit 2014)

Data Center

Production

ETL

Development

Page 7: Let's Talk Operations! (Hadoop Summit 2014)

ETL

Dev Prod

Base ETL Pull

Event FeedsDatabase Feeds

Base ETL Pull

Base ETL PullPost-Processed

Data

Page 8: Let's Talk Operations! (Hadoop Summit 2014)

DC2DC1

Production

ETL

Development

Page 9: Let's Talk Operations! (Hadoop Summit 2014)

How do I solve some common distcp issues?

Page 10: Let's Talk Operations! (Hadoop Summit 2014)

• Common issues!• Version incompatibilities!• Network bandwidth consumption!!

• Some tricks!• Use WebHDFS!

• All modern versions support it!• Read and write in both directions!

• Create a separate queue with hard limits!• Pull from larger, push from smaller

Page 11: Let's Talk Operations! (Hadoop Summit 2014)

Q&A

Allen  Wittenauer  Twitter:  @_a__w_ Email:  aw  @  apache.org  

Page 12: Let's Talk Operations! (Hadoop Summit 2014)

Bonus Slide!

Page 13: Let's Talk Operations! (Hadoop Summit 2014)

20 GB /, ... 200 GB task space (rest) HDFS

• root partitioning !!!!!

• non-root partitioning

5 GB swap 200 GB task space (rest) HDFS