hadoop introduction
TRANSCRIPT
Hadoop IntroductionBackground && Installation && Hello world && related
Outline
• Background• Hello world• Installation• Related
23/4/12 2
Background
• Why Hadoop?• Accessible: AWS• Robust : handle most such failures• Scalable: linearly • Simple: 1 == 1 w
• Key Points:• Scale-out• Moving code to data
23/4/12 3
Background: History
• Apache Top Project: Doug Cutting• Lucence -> Nutch -> Hadoop(2004)• Yahoo (1w)• Facebook (Hive, Hbase,…)• HULU (Hbase)• Baidu (3000TB, one week)• Twitter (sweat data)
23/4/12 4
Background
• Comparing SQL database and Hadoop• Structure: • SQL(structure data, Specific Pattern)• Hadoop(Key-value, like Text, Picture)
• Scale-out <- scale-up• Key-Value <- Relation Tables• Functional Programming <- Declarative Queries• Offline batch processing <- Online (Once Write ,
Read many times)23/4/12 5
Background – Understanding
• Word Count• File Size ++ , Memory Leak• Disk-Hash Table (More complex) • Distributed:
• Phase 1: Part Processing• Phase 2: Merge Results
• Shuffle the partitions the appropriate machines(AlphaBeta)
• Now, We have already finish a minimal Hadoop.
23/4/12 6
Hello World: Word Count
• Two Phase:• Mapping: 获取输入数据,并将其装载到 mapper 中• Reducing: 处理来自 mapper 的所有输出,产生最终结果。
• 1.1 list(filename, file content)• 1.2 list(word, 1)• 2.1 list(word, list(word))• 2.2 list(word, count)
23/4/12 7
Hello World
• mapper.py • Reducer.py
23/4/12 8
Installation
• Mode:• 单机模式( default)• 伪分布模式 推荐开发和调试模式• 全分布模式
• Configuration:• 基本配置• Ssh 配置• Ubuntu 配置
23/4/12 9
Hadoop Framework
• HDFS:• NameNode : 跟踪,指导,记录• DataNode :底层 IO 操作• Secondary NameNode
• Map Reduce :• Job Tracker• Task Tracker
23/4/12 10
Related
• Programming:• Java• Python • Jython ( Translate Python )• Hadoop Streaming ( stdin , stdout )• Dumbo• Happy
23/4/12 11
Related
• Pig: 高级数据流语言• Hive: SQL 数据仓库• Hbase : Google BigTable , 面向列的数据库• ZookKeeper: 共享状态的协同系统• Chukwa : 数据收集系统• Mahout :数据挖掘与机器学习• Hama: 矩阵计算
23/4/12 12
Resource
• Book:• Hadoop In action• Hadoop 实战 (第二版)
• Video && Google Course• URL:• 资源收藏
23/4/12 13
thanks
23/4/12 14