hadoop syllabus

Hadoop Syllabus

Trainer – Milind Jagre

1. Introduction to Big Data

a. What is Big Data?

b. What are the challenges for processing big data?

c. What technologies support big data?

d. The V’s of Big Data and Growing.

2. Introduction to Hadoop

a. An Overview of Hadoop

b. History of Hadoop

c. The Hadoop Distributed File System

d. MapReduce Programming model

e. Hadoop Ecosystem

3. Hadoop Cluster Setup

a. HDFS Design Goals

b. Name Node (NN), Secondary Name Node (SNN) and Data Nodes (DN)

c. Job Tracker(JT) and Task Tracker (TT)

d. Replica and Block Placement

e. HDFS commands

f. Read and Write Flow

4. Apache MapReduce

a. Components

b. Programming Model

c. Configuring and Writing MapReduce jobs in IDE

5. Hive

a. Introduction

b. Installation and Configuration

c. Data Types and File Formats

d. Loading data in internal table

e. Loading data in external table

f. Views in hive

g. Indexes in hive

h. Performance tuning in hive

6. Pig Latin

a. Installation and Configuration

b. What is grunt shell ?

c. Command Syntaxes

d. Data Model of Pig

e. Pig Script for wordcount

f. Java Code for running Pig for wordcount

7. Sqoop

a. Installation and Configuration

b. sqoop-import data

c. sqoop-free form query import

d. sqoop-export data

8. Oozie

a. What is oozie ?

b. Why do we use it ?

c. oozie Architecture

d. oozie action nodes

9. NoSQL

a. Introduction and Interaction

b. Storage Architecture

c. CRUD Operations

d. Query NoSQL Stores

e. Modifying Data Stores

f. Indexing

g. Managing Transactions

h. NoSQL in cloud

i. Parallel Processing

j. Performance Tuning

k. Tools and Utilities

hadoop syllabus

Documents