introduction to the oracle big data appliance - 1...introduction to the oracle big data appliance -...

51
Hello and welcome to this online, self-paced course titled Administering and Managing the Oracle Big Data Appliance (BDA). This course contains several lessons. This lesson is titled Introduction to the Oracle Big Data Appliance (BDA). My name is Lauran Serhal. I am a curriculum developer at Oracle and I have helped educate customers on Oracle products since 1995. I'll be guiding you through this course, which consists of lectures, demos, and review sessions. The goal of this lesson is to help you understand how to setup and configure the Oracle BDA and to identify some of its hardware and software components. Introduction to the Oracle Big Data Appliance - 1

Upload: others

Post on 20-May-2020

40 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

Hello and welcome to this online, self-paced course titled Administering and Managing the

Oracle Big Data Appliance (BDA). This course contains several lessons. This lesson is titled

Introduction to the Oracle Big Data Appliance (BDA). My name is Lauran Serhal. I am a

curriculum developer at Oracle and I have helped educate customers on Oracle products

since 1995. I'll be guiding you through this course, which consists of lectures, demos, and

review sessions.

The goal of this lesson is to help you understand how to setup and configure the Oracle BDA

and to identify some of its hardware and software components.

Introduction to the Oracle Big Data Appliance - 1

Page 2: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

Introduction

Before we begin, take a look at some of the features of this course player. If you’ve viewed a

similar self-paced course in the past, feel free to skip this slide.

Menu

This is the Menu tab. It’s set up to automatically progress through the course in a linear

fashion, but you can also review the material in any order. Just click a slide title in the outline

to display its contents.

Notes

Click the Notes tab to view the audio transcript for each slide.

Search

Use the Search field to find specific information in the course.

Player Controls

Use these controls to pause, play, or move to the previous or next slide. Use the interactive

progress bar to fast forward or rewind the current slide. Some interactive slides in this course

may contain additional navigation and controls. The view for certain slides may change so

that you can see additional details.

Resources (Optional)

Click the Resources button to access any attachments associated with this course.

Glossary (Optional)

Click the Glossary button to view key terms and their definitions.

Introduction to the Oracle Big Data Appliance - 2

Page 3: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

So, you know the title of the course, but you may be asking yourself, “Is this the right course

for me?” Click the bars to learn about the course objectives, target audience, and

prerequisites.

Introduction to the Oracle Big Data Appliance - 3

Page 4: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

What can you expect to get out of this course? Here are the core learning objectives.

After completing this course, you should be able to define the Hadoop ecosystem and its

components including Hadoop’s Distributed File System (HDFS), MapReduce, Spark, YARN,

and some other related projects. You will also learn how to complete the BDA Site Checklists,

run the Oracle BDA Configuration Utility, and install the Oracle BDA Mammoth software on

the Oracle BDA. Finally, you will learn how to use, administer, and manage the Oracle Big

Data Appliance.

Introduction to the Oracle Big Data Appliance - 4

Page 5: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

Who is this course for? Here is the intended audience.

• Application Developers

• Database Administrators

• Hadoop/Big Data Cluster Administrators

• Hadoop Programmers

Introduction to the Oracle Big Data Appliance - 5

Page 6: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

Before taking this course, you should have basic knowledge of Hadoop, exposure to Big

Data, and optionally some basic database knowledge.

Introduction to the Oracle Big Data Appliance - 6

Page 7: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

In this course, we'll talk about the following lessons:

In the Introduction to the Hadoop Ecosystem, you define the Hadoop ecosystem and

describe the Hadoop core components and the other related projects in the ecosystem. You

will also learn how to choose a Hadoop distribution and version. You will also learn about the

architectural components of HDFS, use the FS shell command-line interface (CLI) to interact

with data stored in HDFS. Finally, you review MapReduce, Spark, and YARN.

In the Introduction to the Oracle BDA lesson, you identify the Oracle Big Data Appliance

(BDA) and its hardware and software components.

In the Oracle BDA Pre-Installation Steps lesson, you learn how to download and complete

the BDA Site Checklists. You also learn how to download and run the Oracle BDA

Configuration Utility and then review the generated configuration files. Finally, you identify the

next steps in installing the Oracle BDA.

In the Working With Mammoth lesson, you learn how to download the Oracle BDA

Mammoth Software Deployment Bundle from My Oracle Support. You also learn how to install

a CDH or NoSQL cluster based on your specifications. You then learn how to install the

Oracle BDA Mammoth Software Deployment Bundle using the Mammoth utility.

Introduction to the Oracle Big Data Appliance - 7

Page 8: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

Now that you’ve learned about some of the other lessons, let’s take a look at the Introduction

to the Oracle Big Data Appliance lesson.

Oracle Big Data Appliance is a flexible, high-performance, and secure engineered system for

running diverse workloads on Hadoop and NoSQL systems.

After completing this lesson, you should be able to:

• Setup and configure the Oracle BDA

• Identify some of the hardware and software components of the Oracle BDA

Introduction to the Oracle Big Data Appliance - 8

Page 9: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

Oracle simplified the deployment of big data solutions by delivering engineered systems. BDA and Exadata form the backbone of the Oracle BDMS. It provides all of the capabilities required to capture data from sources, transform that data, and analyze it. The BDA is shipped with Cloudera Distribution Including Hadoop – specifically, the enterprise data hub. It contains everything such as HDFS, YARN, Spark, Oozie (workflow), Cloudera Manager, Security (Sentry), HDFS Encryption, Navigator and more.

1. Next, Oracle adds its own components for integrating the BDA with the Oracle Database and for performing discovery and analysis. For example, Big Data Discovery allows business users to find interesting data sets, wrangle the data, and then explore the data. This is all done by using an intuitive UI w/no programming required.

2. Oracle NoSQL Database is a leading Key Value store. ORAAH enables R users to analyze data on the cluster using their favorite analytic language while leveraging the processing power of the cluster.

3. Big Data Spatial and Graph is a new offering which allow both spatial analysis and the ability to understand relationships and spheres of influence in data.

4. Big Data Connectors enable integration with the Oracle Database as well as analysis using the XQuery language

5. Big Data SQL enables unified analysis across all your data. This allows you to securely query data from NoSQL, Oracle DB, and HDFS using Oracle’s rich query language SQL

Introduction to the Oracle Big Data Appliance - 9

Page 10: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

The Oracle BDA is a flexible, high-performance, secure platform for running diverse

workloads on Hadoop and NoSQL systems. With Oracle Big Data SQL, Oracle Big Data

Appliance extends Oracle’s industry-leading implementation of SQL to Hadoop and NoSQL

systems. By combining the newest technologies from the Hadoop ecosystem and powerful

Oracle SQL capabilities together on a single pre-configured platform, Oracle Big Data

Appliance is uniquely able to support rapid development of new Big Data applications and

tight integration with existing relational data.

Oracle Big Data Appliance is optimized to capture and analyze the massive volumes of varied

data generated by social media feeds, email, web logs, photographs, smart meters, sensors,

and similar devices.

It is possible to connect your existing Oracle Database host to the Oracle BDA through a

network to add new data sources to an existing data warehouse.

Introduction to the Oracle Big Data Appliance - 10

Page 11: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

The core design principle for the Oracle Big Data Appliance is to simplify access to ALL data

which provides the following benefits:

• No Bottlenecks

• Full Stack Install and Upgrades

• Simplified Management

• Cluster Growth

• Critical Node Migration

• Always Highly Available

• Always Secure

• Very Competitive Price Point

Introduction to the Oracle Big Data Appliance - 11

Page 12: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

Complete the following checklists to ensure that the site is prepared for the Oracle BDA. This

is covered in the Oracle Big Data Appliance (BDA): Pre-Installation Steps lesson in this

course:

1. System Components

2. Data Center Room

3. Data Center Environment

4. Access Route

5. Facility Power

6. Safety

7. Logistics

8. Network

9. Auto Service Request

10. Oracle Enterprise Manager

11. Reracking

Introduction to the Oracle Big Data Appliance - 12

Page 13: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

Complete the fields in the Pre-installation Site Evaluation form to provide basic customer

contact information and to evaluate the overall readiness of the site for installation.

The form shows the following information:

• Customer name, signature and date (optional), address, and contact.

• Oracle installation service request number (if known).

• Installation type: How many racks? Is this a new installation or is it an extension to an

existing installation?

• Is the site prepared for installation? The choices are:

- A: Pass. All complete and ready for installation

- B: Conditional pass. Not ready, but all open issues are scheduled to be fixed

- F: Fail. Not ready, and some open issues are not scheduled to be fixed

• Open issues: What did not pass? When will the issue be resolved?

• Comments.

Introduction to the Oracle Big Data Appliance - 13

Page 14: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

The Oracle BDA Configuration Generation Utility is an application that is required for Oracle

BDA deployment. The utility:

• Acquires information from you, such as IP addresses and software preferences, and so

on.

• After guiding you through a series of pages, the utility generates a set of configuration

files.

• The generated files help automate the deployment process and ensure that Oracle BDA

is configured to your specifications. The generated files will be used when you run the

Mammoth command to install the software on your BDA.

The Oracle BDA Configuration Generation Utility is covered in detail in the Oracle BDA Pre-

Installation Steps lesson in this course.

Introduction to the Oracle Big Data Appliance - 14

Page 15: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

The mammoth utility installs and configures the software on the Oracle BDA (across all

servers in the rack) by using the files generated by the BDA Configuration Generation Utility.

This is covered in detail in the Working With Mammoth lesson in this course.

A cluster can be dedicated to either CDH (Hadoop) or Oracle NoSQL Database.

Mammoth also performs the following tasks:

• Creates the required user accounts

• Starts the correct services

• Sets the appropriate configuration parameters. When it is done, you will have a fully

functional Hadoop or NoSQL cluster

Introduction to the Oracle Big Data Appliance - 15

Page 16: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

Elastic Configurations

Big Data Appliance is designed to expand as your data and requirements grow. Initial big data

implementations may start with Big Data Appliance Starter Rack. This six server rack comes

fully equipped with a complete set of switches and power distribution units (PDU) required for

a full rack. The Starter Rack and switching gear enables the appliance to be easily and

efficiently expanded in single node hardware increments to larger configurations using Oracle

Big Data Appliance X5-2 High Capacity (HC) Node plus InfiniBand Infrastructure.

You can also expand older machines with new generation servers. A Full BDA Rack delivers

optimal blend of capacity and expansion options. It comes with 18 servers.

The Oracle BDA enables you to elastically scale out. In addition to expanding the system

within a rack, multiple racks can be connected using the integrated InfiniBand fabric to form

larger configurations; up to 18 racks can be connected in a non-blocking manner by

connecting InfiniBand cables without the need for any external switches.

Introduction to the Oracle Big Data Appliance - 16

Page 17: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

In the slide example, some of the components of the Sun Server X5-2L in an Oracle BDA Full

Rack configuration are listed. In addition, some of the integrated software is also shown.

For detailed information about the complete components of the Sun Server X5-2L, see the

Oracle Big Data Appliance Owner's Guide.

Introduction to the Oracle Big Data Appliance - 17

Page 18: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

An Oracle Big Data Appliance starter rack has the same hardware configuration as a full rack,

except that it comes with six servers instead of 18. All switches and power supplies are

included in the starter rack, and do not need to be upgraded or supplemented to support

additional servers.

Introduction to the Oracle Big Data Appliance - 18

Page 19: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

BDA X5-2 Elastic Configurations enables expansion of a system in 1-node increments by

adding a BDA X5-2 High Capacity (HC) plus InfiniBand Infrastructure into a 6-node Starter

Rack.

The increased flexibility enables customers to start with a production scale cluster of 6 nodes

(X5-2 or older) and then increment within the base rack up to 18 nodes. Customers can also

expand across racks without any additional switching (no top of rack required, all on the same

InfiniBand network) to build large(r) clusters. The expansion is of course all supported from

the Oracle Mammoth configuration utility and its CLI, greatly simplifying expansion of clusters.

Introduction to the Oracle Big Data Appliance - 19

Page 20: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

All services are installed on all nodes in a CDH cluster, but individual services run only on

designated nodes. There are slight variations in the location of the services depending on the

configuration of the cluster.

The table in the slide identifies the services in CDH clusters configured within a single rack,

including starter racks and clusters with more than six nodes. Node01 is the first server in the

cluster and nodenn is the last server in the cluster.

Critical nodes are required for the cluster to operate normally and provide all services to

users. In contrast, the cluster continues to operate with no loss of service when a noncritical

node fails. On single-rack clusters, the critical services are installed initially on the first four

nodes of the cluster. The remaining nodes (node05 up to node18) only run noncritical

services. If a hardware failure occurs on one of the critical nodes, then the services can be

moved to another, noncritical server. For example, if node02 fails, then you might move its

critical services to node05. The table in the slide identifies the initial location of services for

clusters that are configured on a single rack.

Introduction to the Oracle Big Data Appliance - 20

Page 21: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

You can use the Oracle BDA Configuration Generation Utility to configure clusters for either

CDH or Oracle NoSQL Database. This is covered in the Oracle Big Data Appliance (BDA)

Pre-Installation Steps lesson in this course.

You can configure multiple clusters in a single rack, or a single cluster can span multiple

racks. Each CDH cluster must have at least six servers, and each Oracle NoSQL Database

cluster must have at least three servers. Therefore, a starter rack supports one CDH cluster

and a full rack supports up to three CDH clusters.

Introduction to the Oracle Big Data Appliance - 21

Page 22: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

By default, Oracle Big Data Appliance uses the YARN implementation of MapReduce but you

can use classic MapReduce (MR1) instead of YARN; however, you can activate either the

MapReduce service or the YARN service. You cannot use both implementations in the same

cluster.

Introduction to the Oracle Big Data Appliance - 22

Page 23: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

As mentioned earlier, critical nodes are required for the cluster to operate normally and to

provide all services to users. In contrast, the cluster continues to operate with no loss of

service when a noncritical node fails. The table in the slide shows the initial location of critical

services for clusters that are configured on a single rack. The critical services are initially

installed on the first four nodes of the cluster.

In a multirack cluster, the standby NameNode critical service run on the first server of the

second rack.

To move a critical node, you must ensure that all clients are reconfigured with the address of

the new node. Alternatively, you can wait for the failed server to be repaired. You must weigh

the loss of services against the inconvenience of reconfiguring the clients.

node05 to node18 are noncritical nodes. The Oracle BDA continues to operate with no loss of

service if a failure occurs on one of the noncritical nodes. The NameNode automatically

replicates the lost data so that it always maintains three copies. MapReduce jobs execute on

copies of the data that are stored elsewhere in the cluster.

Introduction to the Oracle Big Data Appliance - 23

Page 24: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

One instance of the first (Active) NameNode initially runs on node01. If this node fails or goes

offline (for example, if there is a restart), the second (Standby) NameNode which runs on node02, automatically takes over to maintain the normal activities of the cluster.

Alternatively, if the second NameNode is already active, it continues without a backup. With

only one NameNode, the cluster is vulnerable to failure. The cluster has lost the redundancy

needed for automatic failover.

In multirack clusters, the NameNode service is installed on the first server of the second rack.

The MySQL backup database also runs on the NameNode. MySQL Database continues to

run, although there is no backup of the master database.

Introduction to the Oracle Big Data Appliance - 24

Page 25: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

The first (Active) ResourceManager initially runs on node03. If this node fails or goes offline

(for example, if there is a restart), the second (Standby) ResourceManager which runs on node04, automatically takes over to distribute MapReduce tasks to specific nodes across the

cluster. Alternatively, if the second ResourceManager is already active, it continues without a

backup. The cluster is vulnerable to failure if there is only one ResourceManager. The cluster

also loses the redundancy that is required for automatic failover. The following services are

disrupted: Cloudera Manager, MYSQL Master Database, Oracle Data Integrator, Hive, Hue,

and Oozie.

Introduction to the Oracle Big Data Appliance - 25

Page 26: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

The Oracle Integrated Lights Out Manager (ILOM) service processor runs its own embedded

operating system and has a dedicated Ethernet port, which together provide out-of-band

management capability. In addition, you can access Oracle ILOM from the server operating

system (Oracle Linux). By using Oracle ILOM, you can remotely manage Oracle Big Data

Appliance as if you were using a local KVM.

Oracle ILOM provides preinstalled advanced service processor (SP) hardware and software

to manage and monitor the Oracle BDA components.

You can use Oracle ILOM to:

• Learn about hardware errors and faults as they occur

• Remotely control the power state of a server

• View the graphical and non-graphical consoles

• View the current status of sensors and indicators on the system

• Determine the hardware configuration of your system

• Receive generated alerts about system events in advance

You can access the features and functions of Oracle ILOM by using either a Web browser of

command-line interface.

Introduction to the Oracle Big Data Appliance - 26

Page 27: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

You can use the following BDA utilities (among others) to manage and monitor the cluster:

• mammoth: Installs all end-user software onsite

• setup-root-ssh: Sets up password-less SSH for the root user for all the servers in

a Big Data Appliance rack

• dcli: Executes commands across a group of servers on Big Data Appliance and

returns the output

• bdacli: Queries various configuration files to return information about the rack, cluster,

server and so on

For a complete list of the available utilities, see the Oracle Big Data Appliance Owner's Guide.

Introduction to the Oracle Big Data Appliance - 27

Page 28: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

Mammoth is the command-line utility that deploys software on the Oracle BDA. You can use Mammoth to:

• Setup the cluster using the generated configuration files.

• Create a cluster on one or more racks.

• Create multiple clusters on an Oracle BDA rack.

• Extend a cluster to new servers.

• Update a cluster with new software.

You can use the Mammoth utility to configure or remove optional services, including network encryption, disk encryption, Kerberos, Sentry, Oracle Audit Vault and Database Firewall, and Auto Service Request.

Before you install any software, you need to download the Mammoth bundle, which contains the installation files and the base image. Before you install the software, you must also use Oracle Big Data Appliance Configuration Generation Utility to generate the configuration files. Both topics are covered in the Oracle BDA Pre-Installation Steps and Working With Mammoth lessons in this course.

You use the same Mammoth bundle for all procedures regardless of the rack size, and whether you are creating CDH or Oracle NoSQL Database clusters, or upgrading existing clusters.

Introduction to the Oracle Big Data Appliance - 28

Page 29: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

You must run the mammoth utility from the /opt/oracle/BDAMammoth directory which is

created when you run BDAMammoth-ol6-4.2.0.run script.

To run the utility, use the mammoth command followed by the arguments and the cluster

name if required. For example, to run all of the required installation steps on our bda1h1 cluster, run the

mammoth command with the –i argument. The -i option or argument runs all the steps on

the cluster. When Mammoth completes step 3 of the installation, it prompts you to reboot, if

it upgraded the base image.

Introduction to the Oracle Big Data Appliance - 29

Page 30: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

The table in the slide lists all of the arguments that you can use with the mammoth command.

The –l argument lists all the steps of the mammoth utility.

The –s <step #> argument runs the specified step on the primary rack.

The -r <start step>-<end step> argument runs steps from <start step> through

<end step>.

The -i [cluster_name] argument runs all steps on the cluster, equivalent to -r 1-17.

Use this option when configuring a new rack or adding a group of servers to a cluster.

The -e newnode1 newnode2 argument adds new nodes to an existing cluster and installs

the needed software on the newly added nodes in one step.

The -p argument upgrades the cluster to the latest patch level.

The -p patchnumber argument installs a one-off patch.

The -u patchnumber argument uninstalls a one-off patch.

The –c argument runs the BDA cluster check tests.

The –h argument displays the command help including the usage and a list of steps.

The –v argument displays the version number.

Introduction to the Oracle Big Data Appliance - 30

Page 31: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

Let's go over some examples using different arguments with the mammoth command.

If you run the mammoth command with the -l argument, it will list all of the mammoth steps.

If you run the mammoth command with the –s 3 bda1h1 argument, it will run only step 3 on

the bda1h1 cluster that is currently being setup.

If you run the mammoth command with the –r 2-6 bda1h1 argument, it will run only steps 2

through 6 on the cluster that is currently being set up.

If you run the mammoth command with the -i bda1h1 argument, it performs a complete

install on cluster bda1h1.

If you run the mammoth command with the -p argument, it upgrades a cluster to the latest

patch level.

If you run the mammoth command with the -c argument, it runs the BDA cluster check tests.

These are run automatically after Step 15 as well.

Introduction to the Oracle Big Data Appliance - 31

Page 32: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

Some of the BDA utilities require a passwordless Secure Shell (SSH) between the local

server and all target servers.

To set up passwordless SSH for root:

1. Connect to an Oracle BDA server using PuTTY or a similar utility. Select an SSH

connection type.

2. Log in as root.

3. Set up passwordless SSH for root across the rack. Run the setup-root-ssh script. You

see the message ssh key added from each server. You can now run any ssh command on

any server in the rack without entering a password.

To remove passwordless SSH from root, run the remove-root-ssh script.

Introduction to the Oracle Big Data Appliance - 32

Page 33: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

Let's go over the parameters that you can use with the setup-root-ssh:

-C: Targets all servers in the cluster, using the list of servers in /opt/oracle/bda/cluster-hosts-infiniband. If you do not specify the target servers, then setup-root-ssh uses all servers in the rack.

-c host1, host2,… Targets the specified servers.

-g groupfile: Targets a user-defined set of servers listed in a groupfile. You can enter either server names or IP addresses in the file, one per line.

-j "etho0_ips[range]": Specifies the range of servers in a starter rack [1-6] or a starter rack with additional servers [1-12].

-h: Displays Help.

-p password: Specifies the root password on the command line. Oracle recommends that you omit this parameter. You will be prompted to enter the password, which the utility does not display on your screen.

The example in the slide uses the setup-root-ssh command without any arguments. You will be prompted to enter the password for each server and see the message ssh key added from each server. You can now run any ssh command on any server in the rack without entering a password.

Introduction to the Oracle Big Data Appliance - 33

Page 34: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

The dcli utility executes commands across a group of servers on the Oracle BDA and

returns the output. You can run the utility from any server.

The dcli utility requires a passwordless Secure Shell (SSH) between the local server and all

target servers. You run the dcli utility on the local server.

The commands specified in dcli, execute on the specified target servers (or all servers if you

omit the servers names).

Introduction to the Oracle Big Data Appliance - 34

Page 35: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

The syntax is: dcli [option] [command]

You can omit all options to run a command on ALL servers in the current rack. The command

is any command that runs from the operating system prompt.

Let's look at some of the available options:

-c nodes: Specifies a comma-separated list of the Oracle BDA servers where the command

is executed.

-C: Uses the list of servers in /opt/oracle/bda/cluster-rack-infiniband as the

target.

-g groupfile: Specifies a file containing a list of the Oracle BDA servers where the

command is executed. You can use either server names or IP addresses in the file, one per

line.

-h or –-help: Displays a description of the commands.

-t: Lists the target servers.

--version: Displays the dcli version number

Introduction to the Oracle Big Data Appliance - 35

Page 36: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

After you setup SSH between the local server and all target servers, you can use the dcli utility. Let's go over some examples.

In the first example, we use dcli imageinfo to verifiy the image version on all target

servers.

In the second example, we use dcli reboot to reboot all the servers in the cluster.

In the third example, we use dcli –t which returns the default list of target servers.

In the last example, we use dcli "ls /root/BDA*" to list the contents of the /root

folder on all servers for any file names that starts with BDA.

Introduction to the Oracle Big Data Appliance - 36

Page 37: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

The bdacli utility:

• Queries various configuration files to return information about the rack, cluster, server,

InfiniBand network, and software patches.

• Adds and removes patches and optional services.

• Migrates critical services between critical nodes, and adds and removes servers from a

cluster.

• Displays usage information if no parameters are included on the command line or the

values are undefined.

The syntax is bdacli action [parameters]. The action element of the syntax can take

various parameters as the following:

• help: Displays general usage information for bdacli, a list of actions, and a list of

supported parameters for the getinfo action.

• {add | remove} patch patch_number: Adds or removes a software patch on the

Oracle BDA that matches patch_number. You must log in as root to use add or

remove.

• admin_cluster parameter node_name: Enables you to administer the nodes in a

cluster in response to a failing server.

Introduction to the Oracle Big Data Appliance - 37

Page 38: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

The bdacli utility uses several actions and parameters. On this page, we will just mention a

few of those actions and parameters. For a complete list, refer to the Oracle BDA Owner's Guide. Let look at some of the parameters that we can use with the admin_cluster action.

migarte: Moves the services from a critical node to a noncritical node, and decommissions

the failing server in Cloudera Manager. You specify the name of the failing critical node, and

the utility selects the noncritical node for the migration. When migration is complete, the new

node has all of the functionality of the original critical node. You can only migrate a critical

node, and should do so only when it is failing.

decommission: Removes the specified node from the cluster and decommissions the server

in Cloudera Manager. It also updates the Mammoth files. You can decommission a failing,

noncritical node.

recommission: Removes the node from the list of decommissioned nodes, and

recommissions the server in Cloudera Manager. Use this command after decommissioning

and repairing a failing server.

reprovision: Restores a server to the cluster as a noncritical node, and re-commissions

the server in Cloudera Manager. Use this command after migrating the services of a critical

node and repairing the failing server.

Introduction to the Oracle Big Data Appliance - 38

Page 39: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

Let's go over some examples:

In the first example, the command bdacli admin_cluster migrate node01 restores HA with a single command. This command moves the services from the critical node01 to a noncritical node, and decommissions the failing server in Cloudera Manager. You specify the name of the failing critical node such as node01, and the utility selects the noncritical node for the migration. When migration is complete, the new node has all of the functionality of the original critical node.

In the second example, the bdacli admin_cluster reprovision node01 restores a server to the cluster as a noncritical node, and re-commissions the server in Cloudera Manager. Use this command after migrating the services of a critical node and repairing the failing server.

In the third example, the bdacli getinfo rack_name command returns the rack's name and the bdacli getinfo cluster_name command returns the cluster's name.

In the fourth example, the bdacli getinfo rack_server_names command returns an ordered list of the host names in the rack.

In the next example, the bdacli getinfo server_mammoth_installed command returns true if the Mammoth utility has deployed the Oracle BDA software on this server; otherwise, it returns false.

In the final example, the bdacli add patch 1234 command installs patch 1234.

Introduction to the Oracle Big Data Appliance - 39

Page 40: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

In this example, we have a 12-node BDA rack in production with Hadoop HA and Security Set-up and is ready to load data. The rack name is bda1 and the CDH cluster name is

bda1h1. We will see what happens when you expand the number of nodes in the cluster to

span two racks.

As we saw earlier in this lesson, you can perform a full install of the BDA software on the Oracle BDA cluster by using the mammoth utility: mammoth –i bda1h1 where –i argument

runs all the steps on the cluster and bda1h1 is the name of the cluster that we chose while

running the Oracle Configuration Utility.

Introduction to the Oracle Big Data Appliance - 40

Page 41: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

All services are installed on all nodes in a CDH cluster, but individual services run only on

designated nodes. There are slight variations in the location of the services depending on the

configuration of the cluster.

Critical nodes are required for the cluster to operate normally and provide all services to

users. In contrast, the cluster continues to operate with no loss of service when a noncritical

node fails. On single-rack clusters, the critical services are installed initially on the first four nodes of the cluster namely, node01, node02, node03, and node04. The remaining nodes

(node05 up to nodenn) only run noncritical services. If a hardware failure occurs on one of

the critical nodes, the second Standby critical node takes over, and you can also move the services to another, noncritical server. For example, if node02 (Active ResourceManager)

fails, then you might move its critical services to any available noncritical node in the cluster such as node05. The table in the slide identifies the initial location of services for CDH

clusters that are configured on a single rack.

Introduction to the Oracle Big Data Appliance - 41

Page 42: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

Let's say on day 90, we expand cluster bda1h1 by 12 nodes. node13 to node18 are added

to the bda1 rack which makes it a full rack. In addition, the second starter BDA rack, bda2,

contains node01 to node06. The bda1h1 cluster now spans two racks.

To expand the cluster to use 12 more nodes, we use the mammoth command using the e

argument as follows:

./mammoth –e bda1node13, bda1node14, bda1node15, …, bda2nod01, …

bda2node06

When expanding a cluster from one to two racks, mammoth moves all critical services from node02 for the standby NameNode and node04 for the Standby ResourceManager of the

first rack to node01 and node02 of the second rack. node02 of the first rack becomes a

noncritical node. The Active NameNode continues to run on node01 of bda1 and the Active

ResourceManager continues to run on node03 of bda1.

Introduction to the Oracle Big Data Appliance - 42

Page 43: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

If a server starts failing, you must take steps to maintain the services of the cluster with as little interruption as possible. You can use the bdacli utility to easily manage a failing server.

One of the management steps is called decommissioning. Decommissioning stops all roles

for all services, thereby preventing data loss. Cloudera Manager requires that you

decommission a CDH node before retiring it. When a noncritical node fails, there is no loss of service. However, when a critical node such as node01 fails in a CDH cluster, services with a

single point of failure are unavailable. You must decide between these alternatives:

1. Wait for repairs to be made, and endure the loss of service until they are complete.

2. Move the critical services to another node. This choice may require that some clients are

reconfigured with the address of the new node. For example, if the second

ResourceManager node (typically node03) fails, then users must redirect their browsers to

the new node to access Cloudera Manager.

Both the NameNodes and ResourceManagers in the Oracle BDA have HA and

automatic failover. If the first (active) NameNode fails or goes offline (such as a restart), then

The second (hot standby) NameNode automatically takes over to maintain the normal activities of the cluster. In our example, if the Active NameNode fails on node01 in bda1,

then the Standby NameNode on node01 in bda2 will take over and becomes the Active

NameNode.

Introduction to the Oracle Big Data Appliance - 43

Page 44: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

If the second NameNode (or second ResourceManager) is already active, it continues without

a backup. With only one NameNode (or one ResourceManager), the cluster is vulnerable to

failure. The cluster has lost the redundancy needed for automatic failover. You can use the bdacli admin_cluster migrate command to restore HA with a single command.

In the first example, the command bdacli admin_cluster migrate bda1node01

restores HA with a single command. This command moves the services from the failing critical node01 to a noncritical node such as node10, and decommissions the failing server

in Cloudera Manager. You specify the name of the failing critical node such as bda1node01,

and the utility selects the noncritical node for the migration. When migration is complete, the

new node has all of the functionality of the original critical node.

Once the failing node/server is repaired or replaced, you can reprovision the server as a

noncritical node. Use the same name as the migrated node for node_name, such as bda1node01 as follows: bdacli admin_cluster reprovision bda1node01

When this command is competed, bda1node01 becomes a noncritical node.

Introduction to the Oracle Big Data Appliance - 44

Page 45: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

In this lesson, you should have learned how to:

• Setup and configure the Oracle BDA

• Identify some of the hardware and software components of the Oracle BDA

Introduction to the Oracle Big Data Appliance - 45

Page 46: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

In this course, we discussed Introduction to the Hadoop Ecosystem, Introduction to the

Oracle BDA, Oracle BDA Pre-Installation Steps, and Working With Mammoth.

You should now be able to:

• Define the Hadoop ecosystem and its components

• Describe Apache Hadoop Distributed File System (HDFS), MapReduce, Spark, and

YARN

• Complete the BDA Site Checklists

• Run the Oracle BDA Configuration Utility

• Install the Oracle BDA Mammoth software on the Oracle BDA

• Use, administer, and manage the Oracle Big Data Appliance (BDA)

Introduction to the Oracle Big Data Appliance - 46

Page 47: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

The Oracle Learning Library offers other self-paced courses about Oracle Big Data Appliance.

Visit the Oracle Learning Library to learn about the courses.

Introduction to the Oracle Big Data Appliance - 47

Page 48: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

Oracle University offers In-Class courses about Oracle Big Data and Oracle NoSQL

database. Visit Oracle University to learn about the following courses:

• Oracle Big Data Fundamentals

• Oracle NoSQL Database for Developers

• Oracle NoSQL Database for Administrators

Introduction to the Oracle Big Data Appliance - 48

Page 49: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

The Oracle Learning Library offers many free demonstrations and tutorials.

And, of course, the Oracle Big Data Appliance documentation and online help embedded

within the product are also valuable resources.

Introduction to the Oracle Big Data Appliance - 49

Page 50: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

Introduction to the Oracle Big Data Appliance - 50

Page 51: Introduction to the Oracle Big Data Appliance - 1...Introduction to the Oracle Big Data Appliance - 12 Complete the fields in the Pre-installation Site Evaluation form to provide basic

Introduction to the Oracle Big Data Appliance - 51