hadoop hive presentation
DESCRIPTION
Hadoop seminar topic,Hadoop Cse,Hadoop pptTRANSCRIPT
![Page 1: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/1.jpg)
Hadoop
![Page 2: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/2.jpg)
Agenda• Problems with traditional large-scale systems
• Requirements for new approaches
• What is Hadoop..?
• Why Hadoop?
• Overview of Hadoop
• HDFS
• Map Reduce
• Applications
• Conclusion
![Page 3: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/3.jpg)
![Page 4: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/4.jpg)
Problems with traditional large-scale systems
Data is being increased day-by-day Issues with the network failure Server failure Loss of data Cost is more. Distributed computing need manual processing
![Page 5: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/5.jpg)
Requirements for new approaches
Data should be stored in a distributed manner and parallel processing.
High performance and less cost. Should be scalable Should be simple to access and process Fault tolerance
![Page 6: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/6.jpg)
What is Hadoop…?
Open Source Framework
Process large amount of data
![Page 7: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/7.jpg)
![Page 8: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/8.jpg)
Why Hadoop…?
• Accessible• Scalable• Robust• Simple
![Page 9: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/9.jpg)
Overview of Hadoop
It handles 3 types of data
Structured
Semi – structured
Unstructured
Analyses and process large amounts of data (Peta byte)
![Page 10: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/10.jpg)
Compare with traditional DB’s
RDBMS
• Stores GB’s of data
• Supports batch process and interactive process
• Allows Updation
• Schemas must me defined
• Only structured data
HADOOP
• Stores PB’s of data
• Only batch process
• Does not allow Updation, it follows WORM
• Schemas not required
• Supports 3 types of data
![Page 11: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/11.jpg)
![Page 12: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/12.jpg)
![Page 13: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/13.jpg)
Components
Hadoop can be divided into 2 parts
1. HDFS – Hadoop Distributed File System
2. MapReduce Programming model
![Page 14: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/14.jpg)
Hadoop Distributed File System
It is a distributed file system
Runs on commodity hardware
Provides high throughput access to application data
suitable for applications that have large data sets.
It is designed to store a very large amount of data (Tera or peta bytes).
![Page 15: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/15.jpg)
![Page 16: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/16.jpg)
Core Architectural Goal of HDFS
A HDFS instance may consist of thousands of server machines.
Detection of faults and quickly recovering from them in an automated manner
![Page 17: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/17.jpg)
MapReduce Programming Model
MapReduce works on divide and conquer rule on the data.
Schedules execution across a set of machines
Manages inter-process communication
The Reducer processes all output from all mappers and arrives at final output
![Page 18: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/18.jpg)
MapReduce Programming Model
– MAP• Map() function that processes a key/value pair to
generate a set of intermediate key/value pairs
– REDUCE• reduce() function that merges all intermediate values
associated with the same intermediate key.
![Page 19: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/19.jpg)
![Page 20: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/20.jpg)
Applications
![Page 21: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/21.jpg)
![Page 22: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/22.jpg)
![Page 23: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/23.jpg)
![Page 24: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/24.jpg)
![Page 25: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/25.jpg)
REFERENCE
• HADOOP IN ACTION
- By CHUK LAM
• YOUTUBE
• WIKEPEDIA
• GOOGLE IMAGES
![Page 26: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/26.jpg)
Conclusion
![Page 27: Hadoop hive presentation](https://reader034.vdocuments.site/reader034/viewer/2022052323/559453f51a28abce4f8b47c9/html5/thumbnails/27.jpg)