training offering apache hadoop ecosystem full stack ... · big data stored in apache hadoop using...

hortonworks.com ©2018 Hortonworks Training Offering

This two day training course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Apache Pig and Apache Hive. Topics include: Essential understanding of HDP & its capabilities, Hadoop, YARN, HDFS, MapReduce/Tez, data ingestion, using Pig and Hive to perform data analytics on Big Data.

PREREQUISITESStudents should be familiar with programming principles and have experience in software development. SQL and light scripting knowledge is also helpful. No prior Hadoop knowledge is required.

TARGET AUDIENCE

Developers and data engineers who need to understand and develop Hive applications on HDP.

FORMAT• 50% Lecture/Discussion• 50% Hands-on labs

Training OfferingApache Hadoop Ecosystem Full Stack Architecture

© 2011-2018 Hortonworks Inc. All Rights Reserved.Privacy Policy | Terms of Service

Hortonworks is a leading provider of enterprise-grade, global data management platforms, services and solutions that deliver actionable intelligence from any type of data for over half of the Fortune 100. Hortonworks is committed to driving innovation in open source communities, providing unique value to enterprise customers. Along with its partners, Hortonworks provides technology, expertise and support so that enterprise customers can adopt a modern data architecture. For more information, visit hortonworks.com.

THE INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT.

About Hortonworks

+1 408 675-0983 +1 855 8-HORTONINTL: +44 (0) 20 3826 1405

For further information, visit hortonworks.com

Contact

03/2018

COURSE OBJECTIVES• Describe the Case for Hadoop• Describe the Trends of Volume, Velocity

and Variety• Discuss the Importance of Open

Enterprise Hadoop• Describe the Hadoop Ecosystem

Frameworks Across the Following Five Architectural Categories:

– Data Management – Data Access – Data Governance & Integration – Security – Operations

• Describe the Function and Purpose of the Hadoop Distributed File System (HDFS)

• List the Major Architectural Components of HDFS and their Interactions

• Describe Data Ingestion• Describe Batch/Bulk Ingestion Options• Describe the Streaming Framework

Alternatives

• Describe the Purpose and Function of MapReduce

• •Describe the Purpose and Components of YARN

• Describe the Major Architectural Components of YARN and their Interactions

• Define the Purpose and Function of Apache Pig

• Work with the Grunt Shell• Work with Pig Latin Relation Names and

Field Names• Describe the Pig Data Types and

Schema• Demonstrate Common Operators Such

as: – ORDER BY – CASE – DISTINCT – PARALLEL – FOREACH

• Understand how Hive Tables are Defined and Implemented

• Use Hive to Explore and Analyze Data Sets

• Explain and Use the Various Hive File Formats

• Understand benefits from a Hive Table that Uses ORC File Formats

• Use Hive to Run SQL-like Queries to Perform Data Analysis

• Use Hive to Join Datasets Using a Variety of Techniques

• Write Efficient Hive Queries• Explain the Uses and Purpose of

HCatalog• Use HCatalog with Pig and Hive

HAND-ON LABS• Starting an HDP Cluster• Using HDFS Commands• Exploring a MapReduce Program• Getting Started with Apache Pig• Exploring Data with Pig

• Splitting a Dataset• Joining Datasets• Preparing Data for Apache Hive• Understanding Apache Hive Tables• Demonstration: Understanding

Partitions and Skew

• Analyzing Big Data with Apache Hive• Joining Datasets in Apache Hive• Using HCatalog with Apache Pig

training offering apache hadoop ecosystem full stack ... · big data stored in apache hadoop using...

Documents