hadoop
DESCRIPTION
Apache Hadoop SeminarTRANSCRIPT
![Page 1: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/1.jpg)
1
Presented by NIKHIL P L
![Page 2: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/2.jpg)
Apache Hadoop
• Developer(s) : Apache Software Foundation
• Type : Distributed File System• License : Apache License 2.0• Written in : Java• O S : Cross platform• Created by : Doug Cutting (2005)• Inspired by: Google’s MapReduce, GFS
2
![Page 3: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/3.jpg)
3
Sub projects
• HDFS– distributed, scalable, and portable file system– Store large data sets– Cope with hardware failure– Runs on top of the existing system
![Page 4: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/4.jpg)
4
HDFS - Replication
• Blocks with data are replicated to multiple nodes
• Allow for node failure without data loss
![Page 5: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/5.jpg)
5
Sub projects .
• MapReduce– Technology from Google– Hadoop's fundamental data filtering algorithm– Map and Reduce functions– Useful in a wide range of application• distributed pattern-based searching, distributed
sorting, web link-graph reversal, machine learning, statistical machine translation.
![Page 6: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/6.jpg)
6
MapReduce - Workflow
![Page 7: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/7.jpg)
7
Hadoop cluster (Terminology)
![Page 8: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/8.jpg)
8
Types of Nodes
• HDFS nodes– NameNode (Master)– DataNode (Slaves)
• MapReduce nodes– Job Tracker (Master)– Task Tracker (Slaves)
![Page 9: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/9.jpg)
9
Types of Nodes .
![Page 10: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/10.jpg)
10
Sub projects ..
• Hive– providing data summarization, query, and analysis– initially developed by Facebook
• Hbase– open source, non-relational, distributed database– Providing Google BigTable-model database -like
capabilities
![Page 11: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/11.jpg)
11
Sub projects …
• Zookeeper– distributed configuration service, synchronization
services, notification systems and naming registry for large distributed systems.
• Pig– A language and compiler to generate Hadoop
programs– Originally developed at Yahoo!
![Page 12: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/12.jpg)
12
How does Hadoop works? .
• HDFS Works
![Page 13: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/13.jpg)
13
How does Hadoop works? ..
• MapReduce Works
![Page 14: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/14.jpg)
14
How does Hadoop works? …
• MapReduce Works
![Page 15: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/15.jpg)
15
How does Hadoop works? ….
• Managing Hadoop Jobs
![Page 16: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/16.jpg)
16
Applications
• Marketing analytics• Machin learning (eg: spam filters)• Image processing• Processing of XML messages
![Page 17: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/17.jpg)
17
• world's largest Hadoop production application• ~20,000 machines running Hadoop
![Page 18: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/18.jpg)
18
• the largest Hadoop cluster in the world with 100 PB of storage
• 1200 machines with 8 cores each + 800 machines with 16 cores each
• 32 GB of RAM per machine• 65 millions files in HDFS• 12 TB of compressed data added per day
![Page 19: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/19.jpg)
19
Other Users
![Page 20: Hadoop](https://reader033.vdocuments.site/reader033/viewer/2022051613/54c6fa1f4a795931168b45e2/html5/thumbnails/20.jpg)
20
Thanks