sky agile horizons hadoop at sky. what is hadoop? - reliable, scalable, distributed where did it...
TRANSCRIPT
Sky Agile HorizonsHadoop at Sky
• What is Hadoop?- Reliable, Scalable, Distributed
• Where did it come from?- Community + Yahoo!
• Where is it now? - Apache Software Foundation
• Why is it called “Hadoop”?
1.01
Hadoop at Sky
Overview
To name just a few…
1.02
Hadoop at Sky
Who is using it?
This screengrab is from one of the Hadoop clusters at Facebook (May 2010)
1.03
Hadoop at Sky
Is it “production” ready?
1.04
Hadoop at Sky
So, what does it give you?
• Distributed Filesystem (HDFS)- Name Node- Data Node(s)
• Distributed Processing Infrastructure- Job Tracker- Task Tracker(s)
1.05
Hadoop at Sky
Just two things...
• Blocks- 64MB chunks (configurable)
• WORM (Write once, read many)
- NO EDITS- NO APPENDS
• Replication- 3 copies- direct
1.06
Hadoop at Sky
HDFS - Overview
1.07
Hadoop at Sky
HDFS - ReadName Node
1 1 1 2
2
2
3 3 34
4 4
Client 1. Get Metadata
2. Fetch Blocks
Data Nodes
Control / Monitoring
1.08
Hadoop at Sky
HDFS - WriteName Node
1 32
Client 1. Create Metadata
2. Put Blocks
Data Nodes
Control / Monitoring
1 1
2 2
3 3
• Slots- X mapper slots, Y reducer slots (per node)
• Jobs- Queued- Prioritised
• Tasks
- Data-aware
1.09
Hadoop at Sky
Distributed Processing
1.10
Hadoop at Sky
Distributed ProcessingJob TrackerClient 1. Setup Job
Task Trackers
Control / Monitoring
M M
M M
R R
M M
M M
R R
M M
M M
R R
M M
M M
R R
M M
M M
R R
• Two modes of operation
1.11
Hadoop at Sky
Implementation
Name Node
Data Node
Job Tracker
Task Tracker
Standalone
Name Node
Job Tracker
Master
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Slaves
1.12
Hadoop at Sky
Building upon the basics
• Map/Reduce – divide & conquer
• Pig – SQL-like “Pig Latin”
• HBase – column-based database
• Hive – data-warehousing (SQL-like queries)
• Mahout – distributed algorithms
1.13
Hadoop at Sky
Sub-projects
• Java-based- Key,Value input, Key,Value output(s)
• Intended for low-level / bespoke work
1.14
Hadoop at Sky
Map/Reduce
Start
M
M
M
M
M
R
M
R
R
R
R
End
• SQL-like syntax, Map/Reduce under the hood
• Client-only software
1.15
Hadoop at Sky
Hive
Query
M R
Results
M R M R M R
1.16
Hadoop at Sky
Live Demo
• It’s not a magic bullet…
• If the tools you need don’t exist…
• Approach is everything…
• Hadoop is *just* the framework
1.17
Hadoop at Sky
Lastly, word of warning...
1.18
Hadoop at Sky
Thank you!
Questions?
http://cotdp.com/hadoop.html- Soft-copy of this presentation- VM image available to download- Example code is on GitHub