getting started with hadoop. who we are 2 how we do it we deliver relevant products and services. a...

51
Getting Started with Hadoop

Upload: joel-bond

Post on 19-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

Getting Started with Hadoop

Page 2: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

2

Who We Are

How We Do It

We deliver relevant products and services.

A distribution of Apache Hadoop that is tested, certified and supported

Comprehensive support and professional service offerings

A suite of management software for Hadoop operations

Training and certification programs for developers, administrators, managers and data scientists

Technical Team

Unmatched knowledge and experience.

Founders, committers and contributors to Hadoop

A wealth of experience in the design and delivery of production software

Credentials

The Apache Hadoop experts.

Number 1 distribution of Apache Hadoop in the world

Largest contributor to the open source Hadoop ecosystem

More committers on staff than any other company

More than 100 customers across a wide variety of industries

Strong growth in revenue and new accounts

Mission: To help organizations profit from their data

Leadership

Strong executive team with proven abilities.

Mike OlsonCEO

Kirk DunnCOOCharles ZedlewskiVP, ProductMary RorabaughCFO

Jeff HammerbacherChief Scientist

Amr AwadallaVP Engineering

Doug CuttingChief ArchitectOmer TrajmanVP, Customer Solutions

©2011 Cloudera, Inc. All Rights Reserved.

Page 3: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.3

Users of Cloudera

Financial Web Retail & Consumer

MediaTelecom

Page 4: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

4

What is Apache Hadoop?

Hadoop Distributed File System (HDFS)

File Sharing & Data Protection Across Physical Servers

MapReduce

Distributed Computing Across Physical Servers

Flexibility

A single repository for storing processing & analyzing any type of data

Not bound by a single schema

Scalability

Scale-out architecture divides workloads across multiple nodes

Flexible file system eliminates ETL bottlenecks

Low Cost

Can be deployed on commodity hardware

Open source platform guards against vendor lock

Hadoop is a platform for data storage and processing that is…

Scalable Fault tolerant Open source

CORE HADOOP COMPONENTS

©2011 Cloudera, Inc. All Rights Reserved.

Page 5: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.5

What Makes Hadoop Different?

• Ability to scale out to Petabytes in size using commodity hardware

• Processing (MapReduce) jobs are sent to the data versus shipping the data to be processed

• Hadoop doesn’t impose a single data format so it can easily handle structure, semi-structure and unstructured data

• Manages fault tolerance and data replication automatically

Page 6: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.6

Why the Need for Hadoop?

10,000

2005 20152010

5,000

0

1.8 trillion gigabytes of data wascreated in 2011…

More than 90% is unstructured data

Approx. 500 quadrillion files

Quantity doubles every 2 years

STRUCTURED DATA UNSTRUCTURED DATA

GIG

AB

YT

ES

OF

DA

TA C

RE

AT

ED

(IN

BIL

LIO

NS

)

Source: IDC 2011

Page 7: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.7

Hadoop Use CasesA

DV

AN

CE

D A

NA

LYT

ICS

DA

TA P

RO

CE

SS

ING

Social Network Analysis

Content Optimization

Network Analytics

Loyalty & Promotions Analysis

Fraud Analysis

Entity Analysis

Clickstream Sessionization

Clickstream Sessionization

Mediation

Data Factory

Trade Reconciliation

SIGINT

Application ApplicationIndustry

Web

Media

Telco

Retail

Financial

Federal

Bioinformatics Genome MappingSequencing Analysis

Use CaseUse Case

Page 8: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.8

Hadoop in the Enterprise

Logs Files Web DataRelational Databases

IDE’s BI / AnalyticsEnterprise Reporting

Enterprise Data Warehouse

Web Application

Management Tools

OPERATORS ENGINEERS ANALYSTS BUSINESS USERS

CUSTOMERS

Page 9: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

9

What is CDH?

Fastest Path to Success

No need to write your own scripts or do integration testing on different components

Works with a wide range of operating systems, hardware, databases and data warehouses

Stable and Reliable

Extensive Cloudera QA systems, software & processes

Tested & run in production at scale

Proven at scale in dozens of enterprise environments

Community Driven

Incorporates only main-line components from the Apache Hadoop ecosystem – no forks or proprietary underpinnings

FREE

Cloudera’s Distribution IncludingApache Hadoop (CDH) is an enterprise-ready distribution of Hadoop that is…

100% Apache open source Contains all components needed for deployment Fully documented and supported Released on a reliable schedule

©2011 Cloudera, Inc. All Rights Reserved.

Page 10: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

10

Component Cloudera Committers Cloudera Founder 2011 Commits

Common 6 Yes #1

HDFS 6 Yes #2

MapReduce 5 Yes #1

HBase 2 No #2

Zookeeper 1 Yes #2

Oozie 1 Yes #1

Pig 0 No #3

Hive 1 No #2

Sqoop 2 Yes #1

Flume 3 Yes #1

Hue 3 Yes #1

Snappy 2 No #1

Bigtop 8 Yes #1

Avro 4 Yes #1

Whirr 2 Yes #1

©2011 Cloudera, Inc. All Rights Reserved.

Cloudera’s Commitment to the Open Source Community

Page 11: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.11

Components of CDH

Coordination

Data IntegrationFast Read/Write

Access

Languages / Compilers

Workflow Scheduling

APACHE ZOOKEEPER

APACHE FLUME, APACHE SQOOP

APACHE HBASE

APACHE PIG, APACHE HIVE

APACHE OOZIE APACHE OOZIE

File System Mount

User Interface

FUSE-DFS

HUE

Cloudera Enterprise

Page 12: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

Block Size = 64MBReplication Factor = 3

Hadoop Distributed File System

Cost is $400-$500/TB

©2011 Cloudera, Inc. All Rights Reserved.12

1

2

3

4

5 2

3

4

5

2

4

5

1

3

5

1

2

5

1

3

4

HDFS

Page 13: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.13

Components of Hadoop

• NameNode – Holds all metadata for HDFS– Needs to be a highly reliable machine

• RAID drives – typically RAID 10• Dual power supplies• Dual network cards – Bonded

– The more memory the better – typical 36GB to - 64GB

• Secondary NameNode – Provides check pointing for the NameNode. Same hardware as the NameNode should be used

Page 14: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.14

Components of Hadoop

• DataNodes – Hardware will depend on the specific needs of the cluster– No RAID needed, JBOD (just a bunch of

disks) is used– Typical ratio is:

• 1 hard drive• 2 cores• 4GB of RAM

Page 15: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.15

Networking

• One of the most important things to consider when setting up a Hadoop cluster

• Typically a top of rack is used with Hadoop with a core switch

• Careful on over subscribing the backplane of the switch!

Page 16: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.16

Map

• Records from the data source (lines out of files, rows of a database, etc) are fed into the map function as key*value pairs: e.g., (filename, line).

• map() produces one or more intermediate values along with an output key from the input.

MapTask

(key 1, values)

(key 2, values)

(key 3, values)

ShufflePhase

(key 1, int. values)

(key 1, int. values)

(key 1, int. values)

Reduce Task

Final (key, values)

Page 17: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.17

Reduce

• After the map phase is over, all the intermediate values for a given output key are combined together into a list

• reduce() combines those intermediate values into one or more final values for that same output key

MapTask

(key 1, values)

(key 2, values)

(key 3, values)

ShufflePhase

(key 1, int. values)

(key 1, int. values)

(key 1, int. values)

Reduce Task

Final (key, values)

Page 18: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.18

MapReduce Execution

Page 19: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.19

Sqoop

SQL to Hadoop

Tool to import/export any JDBC-supported database into Hadoop

Transfer data between Hadoop and external databases or EDW

High performance connectors for some RDBMS

Developed at Cloudera

Page 20: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.20

Flume

Distributed, reliable, available service for efficiently moving large amounts of data as it is produced

Suited for gathering logs from multiple systems

Inserting them into HDFS as they are generated

Design goals

Reliability, Scalability, Manageability, Extensibility

Developed at Cloudera

Page 21: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

Flume: high-level architecture

Agent Agent Agent

Processor Processor

Collector(s)

Agent

Configurable levels of reliability

Guarantee delivery in event of failure

Deployable, centrally administered

compress

encrypt

batch

encrypt

Flexibly deploy decorators at any step to improve performance, reliability or security

Optionally pre-process incoming data: perform transformations, suppressions, metadata enrichment

Writes to multiple HDFS file formats (text, sequence, JSON, Avro, others)

Parallelized writes across many collectors – as much write throughput as

MASTER

Master send configuration to all Agents

©2011 Cloudera, Inc. All Rights Reserved.21

Page 22: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.22

HBase

Column-family store. Based on design of Google BigTable

Provides interactive access to information

Holds extremely large datasets (multi-TB)

Constrained access model

(key, value) lookup

Limited transactions (only one row)

Page 23: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.

HBase

23

Page 24: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.24

Hive

SQL-based data warehousing application

Language is SQL-like

Supports SELECT, JOIN, GROUP BY, etc.

Features for analyzing very large data sets

Partition columns, Sampling, Buckets

Example:SELECT s.word, s.freq, k.freq FROM shakespeares JOIN ON (s.word= k.word) WHERE s.freq >= 5;

Page 25: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.25

Pig

Data-flow oriented language – “Pig latin”

Datatypes include sets, associative arrays, tuples

High-level language for routing data, allows easy

integration of Java for complex tasks

Example:emps=LOAD 'people.txt’ AS(id,name,salary); rich = FILTER emps BY salary > 100000; srtd =

ORDER rich BY salary DESC; STORE srtd INTO ’rich_people.txt';

Page 26: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.26

Oozie

Oozie is a workflow/cordination service to manage data processing

jobs for Hadoop

Page 27: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.27

Zookeeper

Zookeeper is a distributed consensus engine

Provides well-defined concurrent access semantics:

Leader election

Service discovery

Distributed locking / mutual exclusion

Message board / mailboxes

Page 28: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.28

Pipes and Streaming

Multi-language connector libraries for MapReduce

Write native-code MapReduce in C++

Write MapReduce passes in any scripting language,

including

Perl

Python

Page 29: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.29

FUSE - DFS

Allows mounting of HDFS volumes via Linux FUSE file

system

Does allow easy integration with other systems for data

import/export

Does not imply HDFS can be used for general-purpose

file system

Page 30: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.30

Hadoop Security

Authentication is secured by Kerberos v5 and integrated with LDAP

Hadoop server can ensure that users and groups are who they say they are

Job Control includes Access Control Lists, which means Jobs can specify who

can view logs, counters, configurations and who can modify a job

Tasks now run as the user who launched the job

Page 31: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.31

Cloudera Enterprise

Simplify and Accelerate Hadoop Deployment

Reduce Adoption Costs and Risks

Lower the Cost of Administration

Increase the Transparency Control of Hadoop

Leverage the Experience of Our Experts

Cloudera Enterprise makesopen source Hadoop enterprise-easy

EFFECTIVENESS

Ensuring YouGet Value From Your Hadoop Deployment

EFFICIENCY

Enabling You toAffordably Run Hadoop in Production

Cloudera Manager

End-to-End Management Application for Apache

Hadoop

Production-Level Support

Our Team of Experts On-Call to Help You Meet

Your SLAs

CLOUDERA ENTERPRISE COMPONENTS

Page 32: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.32

Cloudera Manager

The industry’s firstend-to-end management

applicationfor Apache Hadoop

Proactively manages theApache Hadoop stack

Automates the full operational lifecycle of Apache Hadoop

DISCOVER DIAGNOSE OPTIMIZEACT

HDFS MAPREDUCE HBASE

ZOOKEEPER OOZIE HUE

Page 33: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.34

Cloudera Enterprise

Including Cloudera Support

Feature Benefit

Flexible Support Windows Choose from 8x5 or 24x7 options to meet SLA requirements

Configuration Checks Verify that your Hadoop cluster is fine-tuned for your environment

Issue Resolution and Escalation Processes

Proven processes ensure that support cases get resolved with maximum efficiency

Comprehensive Knowledgebase

Browse through hundreds of Articles and Tech Notes to expand upon your knowledge of Apache Hadoop

Certified Connectors Connect your Apache Hadoop cluster to your existing data analysis tools such as IBM Netezza and Revolution Analytics

Notification of New Developments and Events

Stay up to speed with what’s going on in the Apache Hadoop community

Page 34: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.35

Cloudera University

Public and Private Training to Enable Your Success

Class DescriptionDeveloper Training & Certification(4 Days)

Hands-on training and certification for developers who want to analyze their data but are new to Apache Hadoop

System Administrator Training & Certification (3 Days)

Hands-on training and certification for administrators who will be responsible for setting up, configuring, monitoring an Apache Hadoop cluster

HBase Training (2 Day) Covers the HBase architecture, data model, and Java API as well as some advanced topics and best practices

Analyzing Data with Hive and Pig(2 Days)

Hive and Pig training is designed for people who have a basic understanding of how Apache Hadoop works and want to utilize these languages for analysis of their data

Essentials for Managers (1 Day) Provides decision-makers the information they need to know about Apache Hadoop, answering questions such as “when is Hadoop appropriate?”, “what are people using Hadoop for?” and “what do I need to know about choosing Hadoop?”

Page 35: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.36

Cloudera Consulting Services

Put Our Expertise To Work For You.

Service Description

Use Case Discovery Assess the appropriateness and value of Hadoop for your organization

New Hadoop Deployment Set up and configure high performance, production-ready Hadoop clusters

Proof of Concept Verify the prototype functionality and project feasibility for a new Hadoop cluster

Production Pilot Deploy your first production-level project using Hadoop

Process and Team Development Define the requirements and processes for creating a new Hadoop team

Hadoop Deployment Certification Perform periodic health checks to certify and tune up existing Hadoop clusters

Cloudera’s team of Solutions Architects provides guidance and hands-on expertise to address unique enterprise challenges.

Page 36: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.37

Journey of the Cloudera Customer

Discover the Benefits of Apache Hadoop

Cloudera’s Distribution

Subscribe to Cloudera Enterprise

Flexibility to store and mine all types

of data

The fastest, surest path to success with

Apache Hadoop

Simplify and accelerate Apache

Hadoop deployment

Page 37: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.38

Cloudera in Production

Logs Files Web DataRelational Databases

IDE’s BI / AnalyticsEnterprise Reporting

Enterprise Data Warehouse

Operational Rules Engines

Management Tools

OPERATORS ENGINEERS ANALYSTS BUSINESS USERS

Cloudera’s Distribution Including Apache Hadoop (CDH)

&SCM Express

Cloudera Enterprise Cloudera Management Suite Cloudera Support

Cloudera Services

Consulting Services Cloudera University

Web Application

CUSTOMERS

Page 38: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.39

Cloudera helps you profit from all your data.

cloudera.com+1 (888) [email protected]

twitter.com/cloudera

facebook.com/cloudera

Get Hadoop

Page 39: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.40

Cloudera Manager

The first and only Hadoop management application that:

1. Manages the full Hadoop lifecycle

2. Manages and monitors the complete Hadoop stack

3. Incorporates comprehensive log and event management

4. Has Technical Support integration built-in

Page 40: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.41

Cloudera Manager

Key Features and Functionality:

Automated Deployment Installs the complete Hadoop stack in minutes. The simple, wizard-based interface guides you through the steps.

Centralized Management Gives you complete, end-to-end visibility and control over your Hadoop cluster from a single interface

Service & Configuration Management Set server roles, configure services and manage security across the cluster

Gracefully start, stop and restart of services as needed

Audit Trails Maintains a complete record of configuration changes for SOX compliance

Proactive Health Checks Monitors dozens of service performance metrics and alerts you when you approach critical thresholds

Intelligent Log Management Gather, view and search Hadoop logs collected from across the cluster

Scans Hadoop logs for irregularities and warns you before they impact the cluster

ONLY CLOUDERA

ONLY CLOUDERA

ONLY CLOUDERA

ONLY CLOUDERA

ONLY CLOUDERA

Page 41: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.42

Key Features and Functionality:

Cloudera Manager

Global Time Control Establishes the time context globally for almost all views

Correlates jobs, activities, logs, system changes, configuration changes and service metrics along a single timeline to simplify diagnosis

Support Integration Takes a snapshot of the cluster state and automatically sends it to Cloudera support to assist with resolution

Event Management Creates and aggregates relevant Hadoop events pertaining to system health, log messages, user services and activities and make them available for alerting and searching

Alerting Generates email alerts when certain events occur

Operational Reports Visualize current and historical disk usage by user, group and directoryTrack MapReduce activity on the cluster by job or user

Host Level Monitoring View information pertaining to hosts in your cluster including status, resident memory, virtual memory and roles

ONLY CLOUDERA

ONLY CLOUDERA

ONLY CLOUDERA

ONLY CLOUDERA

Page 42: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.43

Max Number of Nodes Supported 50 Unlimited

Automated Deployment

Host-Level Monitoring

Secure Communication Between Server & Agents

Configuration Management

Manage HDFS, MapReduce, HBase, Hue, Oozie & Zookeeper

Audit Trails

Start/Stop/Restart Services

Add/Restart/Decomission Role Instances

Configuration Versioning & History

Support for Kerberos

Service Monitoring

Proactive Health Checks

Status & Health Summary

Intelligent Log Management

Events Management & Alerts

Activity Monitoring

Operational Reporting

Global Time Control

Support Integration

FREE EDITION ENTERPRISE EDITION**Two Editions:

** Part of the Cloudera Enterprise subscription

Page 43: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.44

View Service Health and Performance

Page 44: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.45

Get Host-Level Snapshots

Page 45: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.46

Monitor and Diagnose Cluster Workloads

Page 46: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.47

Gather, View and Search Hadoop Logs

Page 47: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.48

Track Events From Across the Cluster

Page 48: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.49

Run Reports on System Performance & Usage

Page 49: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.50

New in Cloudera Manager 3.7

1. Proactive Health Checks Monitors dozens of service performance metrics and alerts you when you approach critical thresholds

2. Intelligent Log Management Gathers and scans Hadoop logs for irregularities and warns you before they impact the cluster

3. Global Time Control Correlates jobs, activities, logs, system changes, configuration changes and service metrics along a single timeline to simplify diagnosis

4. Support Integration Takes a snapshot of the cluster state and automatically sends it to Cloudera support to assist with resolution

5. Event Management Creates and aggregates relevant Hadoop events pertaining to system health, log messages, user services and activities and make them available for alerting and searching

6. Alerts Generates email alerts when certain events occur

7. Audit Trails Maintains a complete record of configuration changes for SOX compliance

8. Operational Reporting Visualize current and historical disk usage by user, group and directory and track MapReduce activity on the cluster by job or user

ONLY CLOUDERA

ONLY CLOUDERA

ONLY CLOUDERA

ONLY CLOUDERA

ONLY CLOUDERA

ONLY CLOUDERA

ONLY CLOUDERA

Page 50: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.51

Cloudera Support

Our team of experts on call to help you meet your SLAs

Feature Benefit

Flexible Support Windows Choose from 8x5 or 24x7 options to meet SLA requirements

Configuration Checks Verify that your Hadoop cluster is fine-tuned for your environment

Issue Resolution and Escalation Processes

Proven processes ensure that support cases get resolved with maximum efficiency

Comprehensive Knowledgebase Browse through hundreds of Articles and Tech Notes to expand upon your knowledge of Apache Hadoop

Certified Connectors Connect your Apache Hadoop cluster to your existing data analysis tools such as IBM Netezza, Revolution Analytics, and MicroStrategy

Proactive Notification of New Developments and Events

Stay up to speed with what’s going on in the Apache Hadoop community

Page 51: Getting Started with Hadoop. Who We Are 2 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified

©2011 Cloudera, Inc. All Rights Reserved.52

Cloudera Enterprise

Why Cloudera Enterprise?

Apache Hadoop is a distributed system that presents unique operational challenges

The fixed cost of managing an internal patch and release infrastructure is prohibitive

Apache Hadoop skills and expertise are scarce

It’s challenging to track consistently to community development efforts

Only Cloudera Enterprise

Has a management application that supports the full lifecycle of operationalizing Apache

Hadoop

• • •

Has production support backed by theApache committers

• • •

Has the depth of experience supporting hundreds of production Apache Hadoop clusters

The Fastest Path to SuccessRunning Apache Hadoop in Production.