big dat ppt

18
Presented By- SHAILJA DALMIA 13IT252 BIG DATA ANALYTICS USING HADOOP

Upload: shailja-dalmia

Post on 11-Apr-2017

229 views

Category:

Documents


1 download

TRANSCRIPT

Page 2: big dat ppt

INTRODUCTIONEra of digitilized WorldChallenges to cutting edge businessesGFS and MapReduceIn 2006,Mike Caferella & Doug Cutting

working under Nutch project implemented Hadoop.

Open Source Framework for writing and running distributed applications.

Page 3: big dat ppt

WHAT IS BIG DATA?

Page 4: big dat ppt

WHY DFS?

Page 5: big dat ppt

What is Distributed File System?

Page 6: big dat ppt

What is Hadoop?

Page 7: big dat ppt

Hadoop Core Components

Page 8: big dat ppt

What is HDFS?

Page 9: big dat ppt

Design of HDFS

Area where HDFS is not a good fit

Page 10: big dat ppt

HDFS COMPONENTS

NameNodeDataNodes

Page 11: big dat ppt

Job Tracker and Task Tracker

Page 12: big dat ppt

HDFS Architecture

Page 13: big dat ppt

Map Reduce• Framework that assigns task to each

datanodes. Map Step-master node takes the input ,partition

it up into smaller sub problem,leading to multi level tree structure.

Reduce Step-Combine the results and generate the output

Each mapping operation is independent of other,Key value pair is generated ,sorters and shufflers are applied .

Parallelism offer fault tolerance,if one nodes fails ,still the work can be rescheduled.

Similar to Divide and Conquer technique. Does task in parallel to accomplish work in less

time.

Page 14: big dat ppt

Hadoop Key Features:

AccessibleRobustnessSimpleScalableCost EffectiveFlexibleFault Tolerant

Page 15: big dat ppt

Differences Between Hadoop and RDBMS

Hadoop Designed to scale out

architecture.Key value pairsFunctional

Programming(scripts and codes),can build complex models

Offline processing (WORA)

RDBMSScaling is expensiveTables having relational

structureDeclarative queriesOnline Processing.(work

for random reading and writing few records.

Page 16: big dat ppt

Hadoop Related TechnologiesAvro-Data Serialization System,rich data

structures,container file,compact fast binary data format.

Chukwa-powerful toolkit for analyzing data.

Hbase-Distributed database,provides big table like capabilities.

Hive-data warehouse useful for data summarization .Uses HiveQL language.

Page 17: big dat ppt

ConclusionHadoop had gained huge momentum

Technologies around are evolving really fast

There is no “One size fits all”

Valuable ,powerful tool.

More targeted businesses.