hadoop for beginners free course ppt

27
Hadoop For Beginners Available for free at hadoop- skills.com This is a free Course Available on Hadoop- Skills.com

Upload: njain85

Post on 26-May-2015

512 views

Category:

Technology


0 download

DESCRIPTION

This is a power point presentation on Hadoop and Big Data. This covers the essential knowledge one should have when stepping into the world of Big Data. This course is available on hadoop-skills.com for free! This course builds a basic fundamental understanding of Big Data problems and Hadoop as a solution. This course takes you through: • This course builds Understanding of Big Data problems with easy to understand examples and illustrations. • History and advent of Hadoop right from when Hadoop wasn’t even named Hadoop and was called Nutch • What is Hadoop Magic which makes it so unique and powerful. • Understanding the difference between Data science and data engineering, which is one of the big confusions in selecting a carrier or understanding a job role. • And most importantly, demystifying Hadoop vendors like Cloudera, MapR and Hortonworks by understanding about them. This course is available for free on hadoop-skills.com

TRANSCRIPT

Page 1: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Hadoop For Beginners

Available for free at hadoop-skills.com

Page 2: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Understanding Big Data

Not the usual way

Page 3: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

The hype around Big Data

Facebook, Twitter, Google generating petabytes of data everyday.

Hadron Collider project discarding large amount of data as they won’t be able to analyse. Hoping that they haven’t thrown anything valuable.

Interesting facts but …. Why is Big Data important?

Lets understand via an example

Page 4: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Bank Example in 90s

Bank

Optimal Price?

Maximise Profit

Insurance3rd Party Survey Expert Debates

Optimal Price

Page 5: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Bank Example – this Century

Bank

Optimal Price?

Maximise Profit

Insurance

Optimal Price

Data Warehousing

Repository

Web Activity

Transaction

Competitors Pricing

Market Trends

Statistics

Data Warehou

seRun Statistical

Algorithms

Decision SupportSystem

Page 6: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Data Warehouse - Limitations

• Worked on small samples of data.• Looking through key hole and finding the size of the

room.

• High turnaround time for meaningful results• deciding to cross the road based on a picture taken 5

mins ago.

Page 7: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Big Data – Text book definition

“Big data are a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data

processing applications”

-White Tom, Definitive Guide

Volume Velocity Variety

Page 8: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Bank Example

Bank

Optimal Price?

Maximise Profit

Insurance

Optimal Price

Data Warehousing

Repository

Web Activity

Transaction

Competitors Pricing

Market Trends

Statistics

Data Warehou

seRun Statistical

Algorithms

Decision SupportSystem

Page 9: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Future Role of Data

What the Industry is striving for…

Page 10: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Role of Data – Now and in Future

Decision Support System

Digital Nervous System

Data

Fundamental block to

Data

FundamentalBlock to

Business @ speed of thought

Sense

Interpret

Decide

Act

Organisations behaving like Biological nervous system

AvatarSkynet

Page 11: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Example – Digital Nervous System

Bank

Repository

Web Activity

Transaction

Competitors Pricing

Market Trends

Statistics

Optimal Price

Mobile Alert with Travel insurance

Page 12: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Data – e-tsunami

International Data Corporation’s (IDC) 6th annual study:

From 2005 to 2020, the digital universe will grow by a factor of 300, from 130 exabytes to 40,000 exabytes, or 40 trillion gigabytes

More than 5,200 gigabytes for every man, woman, and child in 2020.

From now until 2020, the digital universe will about double every two years.

33% of the digital data might be valuable if analysed, compared with 25% today.

From Gartner:

4.4 Million IT Jobs Globally to Support Big Data By 2015.

Page 13: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

What Triggered Big Data Technologies

Knowing Hadoop when it wasn’t hadoop…

Page 14: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

History of Hadoop

2003-041996-2000 2005-06 2010 2013

Google File SystemAnd MapReduce Papers

YARN/MapReduce 2/Next Generation Hadoop

Hadoop spawns offNutch

Big Data problem faced byAll Search engines

and Mike

Dreadnaught

Doug Joins Cloudera

0.xx Releases of hadoop

Page 15: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Introduction to Hadoop Magic

What is the new thing that Hadoop brings to computing…

Page 16: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Ox and the load

Page 17: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Distributed ComputingPrice Advantage:

1. Clusters use commodity hardware, cheaper than one expensive server.2. Software License is free.

Page 18: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Hadoop Framework – Brief overview

HDFS

MapReduce

Google File System

Google MapReduce

file1

Name node

Data nodes

map map map map map Reduce

User

Page 19: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

The new Fundamentals• Moving the code to data

• Use of Commodity Hardware and Open Source Software against expensive proprietary software on expensive custom Hardware.

• On read schema.

Page 20: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Hadoop Ecosystem

The umbrella of tools around hadoop…

Page 21: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Hadoop Ecosystem

HDFS

MapReduce HBase

Pig Hive

Sqoop/Flume

Log collection

Yahoo Facebook

Storm

Chukwa

Kafka

Structured Stores

Message broker

Oozie

Page 22: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Few interesting Discussions

Hadoop-skills.com bringing hadoop learners together…

Page 23: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Simpler Vs Complex Algorithms

Complex Algorithm on a small dataset

Simple Algorithm on a large dataset

1. Complex Algorithms needs to be correctly sensitive to week correlations.2. Complex Algorithms are thus difficult to code and design.

Page 24: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Data Engineer Vs Data Scientist

Data Engineer Data Scientist

Role

Skills

To solve business problems using data.

To engineer software solutions.

More of programing and technical skills and ability to architect technical solutions.

Strong of Mathematical Skills and understanding of statistical Models.

Page 25: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Hadoop Vendors

-> Skeleton Version

-> All the ecosystems need to be additionally installed.

-> Important ecosystem members included.

-> Few Proprietary tools like Enterprise Manager.

-> Proprietary Hadoop code written in C.

-> Integrated with Hadoop ecosystem members.

-> Based out of Apache hadoop.

-> Supports .NET framework

-> Launches Hadoop Distribution: Pivotal HD

Page 26: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

Thank You!!!

Page 27: Hadoop for beginners   free course ppt

This is a free Course Available on Hadoop-Skills.com

I got a lucky chance to meet Doug!!!And explain what little I am doing with hadoop…

Superstar-Doug!!!

A small fan :- Me

And the real Hadoop