introduction to the oracle big data appliance - 1...introduction to the oracle big data appliance -...
TRANSCRIPT
Hello and welcome to this online, self-paced course titled Administering and Managing the
Oracle Big Data Appliance (BDA). This course contains several lessons. This lesson is titled
Introduction to the Oracle Big Data Appliance (BDA). My name is Lauran Serhal. I am a
curriculum developer at Oracle and I have helped educate customers on Oracle products
since 1995. I'll be guiding you through this course, which consists of lectures, demos, and
review sessions.
The goal of this lesson is to help you understand how to setup and configure the Oracle BDA
and to identify some of its hardware and software components.
Introduction to the Oracle Big Data Appliance - 1
Introduction
Before we begin, take a look at some of the features of this course player. If you’ve viewed a
similar self-paced course in the past, feel free to skip this slide.
Menu
This is the Menu tab. It’s set up to automatically progress through the course in a linear
fashion, but you can also review the material in any order. Just click a slide title in the outline
to display its contents.
Notes
Click the Notes tab to view the audio transcript for each slide.
Search
Use the Search field to find specific information in the course.
Player Controls
Use these controls to pause, play, or move to the previous or next slide. Use the interactive
progress bar to fast forward or rewind the current slide. Some interactive slides in this course
may contain additional navigation and controls. The view for certain slides may change so
that you can see additional details.
Resources (Optional)
Click the Resources button to access any attachments associated with this course.
Glossary (Optional)
Click the Glossary button to view key terms and their definitions.
Introduction to the Oracle Big Data Appliance - 2
So, you know the title of the course, but you may be asking yourself, “Is this the right course
for me?” Click the bars to learn about the course objectives, target audience, and
prerequisites.
Introduction to the Oracle Big Data Appliance - 3
What can you expect to get out of this course? Here are the core learning objectives.
After completing this course, you should be able to define the Hadoop ecosystem and its
components including Hadoop’s Distributed File System (HDFS), MapReduce, Spark, YARN,
and some other related projects. You will also learn how to complete the BDA Site Checklists,
run the Oracle BDA Configuration Utility, and install the Oracle BDA Mammoth software on
the Oracle BDA. Finally, you will learn how to use, administer, and manage the Oracle Big
Data Appliance.
Introduction to the Oracle Big Data Appliance - 4
Who is this course for? Here is the intended audience.
• Application Developers
• Database Administrators
• Hadoop/Big Data Cluster Administrators
• Hadoop Programmers
Introduction to the Oracle Big Data Appliance - 5
Before taking this course, you should have basic knowledge of Hadoop, exposure to Big
Data, and optionally some basic database knowledge.
Introduction to the Oracle Big Data Appliance - 6
In this course, we'll talk about the following lessons:
In the Introduction to the Hadoop Ecosystem, you define the Hadoop ecosystem and
describe the Hadoop core components and the other related projects in the ecosystem. You
will also learn how to choose a Hadoop distribution and version. You will also learn about the
architectural components of HDFS, use the FS shell command-line interface (CLI) to interact
with data stored in HDFS. Finally, you review MapReduce, Spark, and YARN.
In the Introduction to the Oracle BDA lesson, you identify the Oracle Big Data Appliance
(BDA) and its hardware and software components.
In the Oracle BDA Pre-Installation Steps lesson, you learn how to download and complete
the BDA Site Checklists. You also learn how to download and run the Oracle BDA
Configuration Utility and then review the generated configuration files. Finally, you identify the
next steps in installing the Oracle BDA.
In the Working With Mammoth lesson, you learn how to download the Oracle BDA
Mammoth Software Deployment Bundle from My Oracle Support. You also learn how to install
a CDH or NoSQL cluster based on your specifications. You then learn how to install the
Oracle BDA Mammoth Software Deployment Bundle using the Mammoth utility.
Introduction to the Oracle Big Data Appliance - 7
Now that you’ve learned about some of the other lessons, let’s take a look at the Introduction
to the Oracle Big Data Appliance lesson.
Oracle Big Data Appliance is a flexible, high-performance, and secure engineered system for
running diverse workloads on Hadoop and NoSQL systems.
After completing this lesson, you should be able to:
• Setup and configure the Oracle BDA
• Identify some of the hardware and software components of the Oracle BDA
Introduction to the Oracle Big Data Appliance - 8
Oracle simplified the deployment of big data solutions by delivering engineered systems. BDA and Exadata form the backbone of the Oracle BDMS. It provides all of the capabilities required to capture data from sources, transform that data, and analyze it. The BDA is shipped with Cloudera Distribution Including Hadoop – specifically, the enterprise data hub. It contains everything such as HDFS, YARN, Spark, Oozie (workflow), Cloudera Manager, Security (Sentry), HDFS Encryption, Navigator and more.
1. Next, Oracle adds its own components for integrating the BDA with the Oracle Database and for performing discovery and analysis. For example, Big Data Discovery allows business users to find interesting data sets, wrangle the data, and then explore the data. This is all done by using an intuitive UI w/no programming required.
2. Oracle NoSQL Database is a leading Key Value store. ORAAH enables R users to analyze data on the cluster using their favorite analytic language while leveraging the processing power of the cluster.
3. Big Data Spatial and Graph is a new offering which allow both spatial analysis and the ability to understand relationships and spheres of influence in data.
4. Big Data Connectors enable integration with the Oracle Database as well as analysis using the XQuery language
5. Big Data SQL enables unified analysis across all your data. This allows you to securely query data from NoSQL, Oracle DB, and HDFS using Oracle’s rich query language SQL
Introduction to the Oracle Big Data Appliance - 9
The Oracle BDA is a flexible, high-performance, secure platform for running diverse
workloads on Hadoop and NoSQL systems. With Oracle Big Data SQL, Oracle Big Data
Appliance extends Oracle’s industry-leading implementation of SQL to Hadoop and NoSQL
systems. By combining the newest technologies from the Hadoop ecosystem and powerful
Oracle SQL capabilities together on a single pre-configured platform, Oracle Big Data
Appliance is uniquely able to support rapid development of new Big Data applications and
tight integration with existing relational data.
Oracle Big Data Appliance is optimized to capture and analyze the massive volumes of varied
data generated by social media feeds, email, web logs, photographs, smart meters, sensors,
and similar devices.
It is possible to connect your existing Oracle Database host to the Oracle BDA through a
network to add new data sources to an existing data warehouse.
Introduction to the Oracle Big Data Appliance - 10
The core design principle for the Oracle Big Data Appliance is to simplify access to ALL data
which provides the following benefits:
• No Bottlenecks
• Full Stack Install and Upgrades
• Simplified Management
• Cluster Growth
• Critical Node Migration
• Always Highly Available
• Always Secure
• Very Competitive Price Point
Introduction to the Oracle Big Data Appliance - 11
Complete the following checklists to ensure that the site is prepared for the Oracle BDA. This
is covered in the Oracle Big Data Appliance (BDA): Pre-Installation Steps lesson in this
course:
1. System Components
2. Data Center Room
3. Data Center Environment
4. Access Route
5. Facility Power
6. Safety
7. Logistics
8. Network
9. Auto Service Request
10. Oracle Enterprise Manager
11. Reracking
Introduction to the Oracle Big Data Appliance - 12
Complete the fields in the Pre-installation Site Evaluation form to provide basic customer
contact information and to evaluate the overall readiness of the site for installation.
The form shows the following information:
• Customer name, signature and date (optional), address, and contact.
• Oracle installation service request number (if known).
• Installation type: How many racks? Is this a new installation or is it an extension to an
existing installation?
• Is the site prepared for installation? The choices are:
- A: Pass. All complete and ready for installation
- B: Conditional pass. Not ready, but all open issues are scheduled to be fixed
- F: Fail. Not ready, and some open issues are not scheduled to be fixed
• Open issues: What did not pass? When will the issue be resolved?
• Comments.
Introduction to the Oracle Big Data Appliance - 13
The Oracle BDA Configuration Generation Utility is an application that is required for Oracle
BDA deployment. The utility:
• Acquires information from you, such as IP addresses and software preferences, and so
on.
• After guiding you through a series of pages, the utility generates a set of configuration
files.
• The generated files help automate the deployment process and ensure that Oracle BDA
is configured to your specifications. The generated files will be used when you run the
Mammoth command to install the software on your BDA.
The Oracle BDA Configuration Generation Utility is covered in detail in the Oracle BDA Pre-
Installation Steps lesson in this course.
Introduction to the Oracle Big Data Appliance - 14
The mammoth utility installs and configures the software on the Oracle BDA (across all
servers in the rack) by using the files generated by the BDA Configuration Generation Utility.
This is covered in detail in the Working With Mammoth lesson in this course.
A cluster can be dedicated to either CDH (Hadoop) or Oracle NoSQL Database.
Mammoth also performs the following tasks:
• Creates the required user accounts
• Starts the correct services
• Sets the appropriate configuration parameters. When it is done, you will have a fully
functional Hadoop or NoSQL cluster
Introduction to the Oracle Big Data Appliance - 15
Elastic Configurations
Big Data Appliance is designed to expand as your data and requirements grow. Initial big data
implementations may start with Big Data Appliance Starter Rack. This six server rack comes
fully equipped with a complete set of switches and power distribution units (PDU) required for
a full rack. The Starter Rack and switching gear enables the appliance to be easily and
efficiently expanded in single node hardware increments to larger configurations using Oracle
Big Data Appliance X5-2 High Capacity (HC) Node plus InfiniBand Infrastructure.
You can also expand older machines with new generation servers. A Full BDA Rack delivers
optimal blend of capacity and expansion options. It comes with 18 servers.
The Oracle BDA enables you to elastically scale out. In addition to expanding the system
within a rack, multiple racks can be connected using the integrated InfiniBand fabric to form
larger configurations; up to 18 racks can be connected in a non-blocking manner by
connecting InfiniBand cables without the need for any external switches.
Introduction to the Oracle Big Data Appliance - 16
In the slide example, some of the components of the Sun Server X5-2L in an Oracle BDA Full
Rack configuration are listed. In addition, some of the integrated software is also shown.
For detailed information about the complete components of the Sun Server X5-2L, see the
Oracle Big Data Appliance Owner's Guide.
Introduction to the Oracle Big Data Appliance - 17
An Oracle Big Data Appliance starter rack has the same hardware configuration as a full rack,
except that it comes with six servers instead of 18. All switches and power supplies are
included in the starter rack, and do not need to be upgraded or supplemented to support
additional servers.
Introduction to the Oracle Big Data Appliance - 18
BDA X5-2 Elastic Configurations enables expansion of a system in 1-node increments by
adding a BDA X5-2 High Capacity (HC) plus InfiniBand Infrastructure into a 6-node Starter
Rack.
The increased flexibility enables customers to start with a production scale cluster of 6 nodes
(X5-2 or older) and then increment within the base rack up to 18 nodes. Customers can also
expand across racks without any additional switching (no top of rack required, all on the same
InfiniBand network) to build large(r) clusters. The expansion is of course all supported from
the Oracle Mammoth configuration utility and its CLI, greatly simplifying expansion of clusters.
Introduction to the Oracle Big Data Appliance - 19
All services are installed on all nodes in a CDH cluster, but individual services run only on
designated nodes. There are slight variations in the location of the services depending on the
configuration of the cluster.
The table in the slide identifies the services in CDH clusters configured within a single rack,
including starter racks and clusters with more than six nodes. Node01 is the first server in the
cluster and nodenn is the last server in the cluster.
Critical nodes are required for the cluster to operate normally and provide all services to
users. In contrast, the cluster continues to operate with no loss of service when a noncritical
node fails. On single-rack clusters, the critical services are installed initially on the first four
nodes of the cluster. The remaining nodes (node05 up to node18) only run noncritical
services. If a hardware failure occurs on one of the critical nodes, then the services can be
moved to another, noncritical server. For example, if node02 fails, then you might move its
critical services to node05. The table in the slide identifies the initial location of services for
clusters that are configured on a single rack.
Introduction to the Oracle Big Data Appliance - 20
You can use the Oracle BDA Configuration Generation Utility to configure clusters for either
CDH or Oracle NoSQL Database. This is covered in the Oracle Big Data Appliance (BDA)
Pre-Installation Steps lesson in this course.
You can configure multiple clusters in a single rack, or a single cluster can span multiple
racks. Each CDH cluster must have at least six servers, and each Oracle NoSQL Database
cluster must have at least three servers. Therefore, a starter rack supports one CDH cluster
and a full rack supports up to three CDH clusters.
Introduction to the Oracle Big Data Appliance - 21
By default, Oracle Big Data Appliance uses the YARN implementation of MapReduce but you
can use classic MapReduce (MR1) instead of YARN; however, you can activate either the
MapReduce service or the YARN service. You cannot use both implementations in the same
cluster.
Introduction to the Oracle Big Data Appliance - 22
As mentioned earlier, critical nodes are required for the cluster to operate normally and to
provide all services to users. In contrast, the cluster continues to operate with no loss of
service when a noncritical node fails. The table in the slide shows the initial location of critical
services for clusters that are configured on a single rack. The critical services are initially
installed on the first four nodes of the cluster.
In a multirack cluster, the standby NameNode critical service run on the first server of the
second rack.
To move a critical node, you must ensure that all clients are reconfigured with the address of
the new node. Alternatively, you can wait for the failed server to be repaired. You must weigh
the loss of services against the inconvenience of reconfiguring the clients.
node05 to node18 are noncritical nodes. The Oracle BDA continues to operate with no loss of
service if a failure occurs on one of the noncritical nodes. The NameNode automatically
replicates the lost data so that it always maintains three copies. MapReduce jobs execute on
copies of the data that are stored elsewhere in the cluster.
Introduction to the Oracle Big Data Appliance - 23
One instance of the first (Active) NameNode initially runs on node01. If this node fails or goes
offline (for example, if there is a restart), the second (Standby) NameNode which runs on node02, automatically takes over to maintain the normal activities of the cluster.
Alternatively, if the second NameNode is already active, it continues without a backup. With
only one NameNode, the cluster is vulnerable to failure. The cluster has lost the redundancy
needed for automatic failover.
In multirack clusters, the NameNode service is installed on the first server of the second rack.
The MySQL backup database also runs on the NameNode. MySQL Database continues to
run, although there is no backup of the master database.
Introduction to the Oracle Big Data Appliance - 24
The first (Active) ResourceManager initially runs on node03. If this node fails or goes offline
(for example, if there is a restart), the second (Standby) ResourceManager which runs on node04, automatically takes over to distribute MapReduce tasks to specific nodes across the
cluster. Alternatively, if the second ResourceManager is already active, it continues without a
backup. The cluster is vulnerable to failure if there is only one ResourceManager. The cluster
also loses the redundancy that is required for automatic failover. The following services are
disrupted: Cloudera Manager, MYSQL Master Database, Oracle Data Integrator, Hive, Hue,
and Oozie.
Introduction to the Oracle Big Data Appliance - 25
The Oracle Integrated Lights Out Manager (ILOM) service processor runs its own embedded
operating system and has a dedicated Ethernet port, which together provide out-of-band
management capability. In addition, you can access Oracle ILOM from the server operating
system (Oracle Linux). By using Oracle ILOM, you can remotely manage Oracle Big Data
Appliance as if you were using a local KVM.
Oracle ILOM provides preinstalled advanced service processor (SP) hardware and software
to manage and monitor the Oracle BDA components.
You can use Oracle ILOM to:
• Learn about hardware errors and faults as they occur
• Remotely control the power state of a server
• View the graphical and non-graphical consoles
• View the current status of sensors and indicators on the system
• Determine the hardware configuration of your system
• Receive generated alerts about system events in advance
You can access the features and functions of Oracle ILOM by using either a Web browser of
command-line interface.
Introduction to the Oracle Big Data Appliance - 26
You can use the following BDA utilities (among others) to manage and monitor the cluster:
• mammoth: Installs all end-user software onsite
• setup-root-ssh: Sets up password-less SSH for the root user for all the servers in
a Big Data Appliance rack
• dcli: Executes commands across a group of servers on Big Data Appliance and
returns the output
• bdacli: Queries various configuration files to return information about the rack, cluster,
server and so on
For a complete list of the available utilities, see the Oracle Big Data Appliance Owner's Guide.
Introduction to the Oracle Big Data Appliance - 27
Mammoth is the command-line utility that deploys software on the Oracle BDA. You can use Mammoth to:
• Setup the cluster using the generated configuration files.
• Create a cluster on one or more racks.
• Create multiple clusters on an Oracle BDA rack.
• Extend a cluster to new servers.
• Update a cluster with new software.
You can use the Mammoth utility to configure or remove optional services, including network encryption, disk encryption, Kerberos, Sentry, Oracle Audit Vault and Database Firewall, and Auto Service Request.
Before you install any software, you need to download the Mammoth bundle, which contains the installation files and the base image. Before you install the software, you must also use Oracle Big Data Appliance Configuration Generation Utility to generate the configuration files. Both topics are covered in the Oracle BDA Pre-Installation Steps and Working With Mammoth lessons in this course.
You use the same Mammoth bundle for all procedures regardless of the rack size, and whether you are creating CDH or Oracle NoSQL Database clusters, or upgrading existing clusters.
Introduction to the Oracle Big Data Appliance - 28
You must run the mammoth utility from the /opt/oracle/BDAMammoth directory which is
created when you run BDAMammoth-ol6-4.2.0.run script.
To run the utility, use the mammoth command followed by the arguments and the cluster
name if required. For example, to run all of the required installation steps on our bda1h1 cluster, run the
mammoth command with the –i argument. The -i option or argument runs all the steps on
the cluster. When Mammoth completes step 3 of the installation, it prompts you to reboot, if
it upgraded the base image.
Introduction to the Oracle Big Data Appliance - 29
The table in the slide lists all of the arguments that you can use with the mammoth command.
The –l argument lists all the steps of the mammoth utility.
The –s <step #> argument runs the specified step on the primary rack.
The -r <start step>-<end step> argument runs steps from <start step> through
<end step>.
The -i [cluster_name] argument runs all steps on the cluster, equivalent to -r 1-17.
Use this option when configuring a new rack or adding a group of servers to a cluster.
The -e newnode1 newnode2 argument adds new nodes to an existing cluster and installs
the needed software on the newly added nodes in one step.
The -p argument upgrades the cluster to the latest patch level.
The -p patchnumber argument installs a one-off patch.
The -u patchnumber argument uninstalls a one-off patch.
The –c argument runs the BDA cluster check tests.
The –h argument displays the command help including the usage and a list of steps.
The –v argument displays the version number.
Introduction to the Oracle Big Data Appliance - 30
Let's go over some examples using different arguments with the mammoth command.
If you run the mammoth command with the -l argument, it will list all of the mammoth steps.
If you run the mammoth command with the –s 3 bda1h1 argument, it will run only step 3 on
the bda1h1 cluster that is currently being setup.
If you run the mammoth command with the –r 2-6 bda1h1 argument, it will run only steps 2
through 6 on the cluster that is currently being set up.
If you run the mammoth command with the -i bda1h1 argument, it performs a complete
install on cluster bda1h1.
If you run the mammoth command with the -p argument, it upgrades a cluster to the latest
patch level.
If you run the mammoth command with the -c argument, it runs the BDA cluster check tests.
These are run automatically after Step 15 as well.
Introduction to the Oracle Big Data Appliance - 31
Some of the BDA utilities require a passwordless Secure Shell (SSH) between the local
server and all target servers.
To set up passwordless SSH for root:
1. Connect to an Oracle BDA server using PuTTY or a similar utility. Select an SSH
connection type.
2. Log in as root.
3. Set up passwordless SSH for root across the rack. Run the setup-root-ssh script. You
see the message ssh key added from each server. You can now run any ssh command on
any server in the rack without entering a password.
To remove passwordless SSH from root, run the remove-root-ssh script.
Introduction to the Oracle Big Data Appliance - 32
Let's go over the parameters that you can use with the setup-root-ssh:
-C: Targets all servers in the cluster, using the list of servers in /opt/oracle/bda/cluster-hosts-infiniband. If you do not specify the target servers, then setup-root-ssh uses all servers in the rack.
-c host1, host2,… Targets the specified servers.
-g groupfile: Targets a user-defined set of servers listed in a groupfile. You can enter either server names or IP addresses in the file, one per line.
-j "etho0_ips[range]": Specifies the range of servers in a starter rack [1-6] or a starter rack with additional servers [1-12].
-h: Displays Help.
-p password: Specifies the root password on the command line. Oracle recommends that you omit this parameter. You will be prompted to enter the password, which the utility does not display on your screen.
The example in the slide uses the setup-root-ssh command without any arguments. You will be prompted to enter the password for each server and see the message ssh key added from each server. You can now run any ssh command on any server in the rack without entering a password.
Introduction to the Oracle Big Data Appliance - 33
The dcli utility executes commands across a group of servers on the Oracle BDA and
returns the output. You can run the utility from any server.
The dcli utility requires a passwordless Secure Shell (SSH) between the local server and all
target servers. You run the dcli utility on the local server.
The commands specified in dcli, execute on the specified target servers (or all servers if you
omit the servers names).
Introduction to the Oracle Big Data Appliance - 34
The syntax is: dcli [option] [command]
You can omit all options to run a command on ALL servers in the current rack. The command
is any command that runs from the operating system prompt.
Let's look at some of the available options:
-c nodes: Specifies a comma-separated list of the Oracle BDA servers where the command
is executed.
-C: Uses the list of servers in /opt/oracle/bda/cluster-rack-infiniband as the
target.
-g groupfile: Specifies a file containing a list of the Oracle BDA servers where the
command is executed. You can use either server names or IP addresses in the file, one per
line.
-h or –-help: Displays a description of the commands.
-t: Lists the target servers.
--version: Displays the dcli version number
Introduction to the Oracle Big Data Appliance - 35
After you setup SSH between the local server and all target servers, you can use the dcli utility. Let's go over some examples.
In the first example, we use dcli imageinfo to verifiy the image version on all target
servers.
In the second example, we use dcli reboot to reboot all the servers in the cluster.
In the third example, we use dcli –t which returns the default list of target servers.
In the last example, we use dcli "ls /root/BDA*" to list the contents of the /root
folder on all servers for any file names that starts with BDA.
Introduction to the Oracle Big Data Appliance - 36
The bdacli utility:
• Queries various configuration files to return information about the rack, cluster, server,
InfiniBand network, and software patches.
• Adds and removes patches and optional services.
• Migrates critical services between critical nodes, and adds and removes servers from a
cluster.
• Displays usage information if no parameters are included on the command line or the
values are undefined.
The syntax is bdacli action [parameters]. The action element of the syntax can take
various parameters as the following:
• help: Displays general usage information for bdacli, a list of actions, and a list of
supported parameters for the getinfo action.
• {add | remove} patch patch_number: Adds or removes a software patch on the
Oracle BDA that matches patch_number. You must log in as root to use add or
remove.
• admin_cluster parameter node_name: Enables you to administer the nodes in a
cluster in response to a failing server.
Introduction to the Oracle Big Data Appliance - 37
The bdacli utility uses several actions and parameters. On this page, we will just mention a
few of those actions and parameters. For a complete list, refer to the Oracle BDA Owner's Guide. Let look at some of the parameters that we can use with the admin_cluster action.
migarte: Moves the services from a critical node to a noncritical node, and decommissions
the failing server in Cloudera Manager. You specify the name of the failing critical node, and
the utility selects the noncritical node for the migration. When migration is complete, the new
node has all of the functionality of the original critical node. You can only migrate a critical
node, and should do so only when it is failing.
decommission: Removes the specified node from the cluster and decommissions the server
in Cloudera Manager. It also updates the Mammoth files. You can decommission a failing,
noncritical node.
recommission: Removes the node from the list of decommissioned nodes, and
recommissions the server in Cloudera Manager. Use this command after decommissioning
and repairing a failing server.
reprovision: Restores a server to the cluster as a noncritical node, and re-commissions
the server in Cloudera Manager. Use this command after migrating the services of a critical
node and repairing the failing server.
Introduction to the Oracle Big Data Appliance - 38
Let's go over some examples:
In the first example, the command bdacli admin_cluster migrate node01 restores HA with a single command. This command moves the services from the critical node01 to a noncritical node, and decommissions the failing server in Cloudera Manager. You specify the name of the failing critical node such as node01, and the utility selects the noncritical node for the migration. When migration is complete, the new node has all of the functionality of the original critical node.
In the second example, the bdacli admin_cluster reprovision node01 restores a server to the cluster as a noncritical node, and re-commissions the server in Cloudera Manager. Use this command after migrating the services of a critical node and repairing the failing server.
In the third example, the bdacli getinfo rack_name command returns the rack's name and the bdacli getinfo cluster_name command returns the cluster's name.
In the fourth example, the bdacli getinfo rack_server_names command returns an ordered list of the host names in the rack.
In the next example, the bdacli getinfo server_mammoth_installed command returns true if the Mammoth utility has deployed the Oracle BDA software on this server; otherwise, it returns false.
In the final example, the bdacli add patch 1234 command installs patch 1234.
Introduction to the Oracle Big Data Appliance - 39
In this example, we have a 12-node BDA rack in production with Hadoop HA and Security Set-up and is ready to load data. The rack name is bda1 and the CDH cluster name is
bda1h1. We will see what happens when you expand the number of nodes in the cluster to
span two racks.
As we saw earlier in this lesson, you can perform a full install of the BDA software on the Oracle BDA cluster by using the mammoth utility: mammoth –i bda1h1 where –i argument
runs all the steps on the cluster and bda1h1 is the name of the cluster that we chose while
running the Oracle Configuration Utility.
Introduction to the Oracle Big Data Appliance - 40
All services are installed on all nodes in a CDH cluster, but individual services run only on
designated nodes. There are slight variations in the location of the services depending on the
configuration of the cluster.
Critical nodes are required for the cluster to operate normally and provide all services to
users. In contrast, the cluster continues to operate with no loss of service when a noncritical
node fails. On single-rack clusters, the critical services are installed initially on the first four nodes of the cluster namely, node01, node02, node03, and node04. The remaining nodes
(node05 up to nodenn) only run noncritical services. If a hardware failure occurs on one of
the critical nodes, the second Standby critical node takes over, and you can also move the services to another, noncritical server. For example, if node02 (Active ResourceManager)
fails, then you might move its critical services to any available noncritical node in the cluster such as node05. The table in the slide identifies the initial location of services for CDH
clusters that are configured on a single rack.
Introduction to the Oracle Big Data Appliance - 41
Let's say on day 90, we expand cluster bda1h1 by 12 nodes. node13 to node18 are added
to the bda1 rack which makes it a full rack. In addition, the second starter BDA rack, bda2,
contains node01 to node06. The bda1h1 cluster now spans two racks.
To expand the cluster to use 12 more nodes, we use the mammoth command using the e
argument as follows:
./mammoth –e bda1node13, bda1node14, bda1node15, …, bda2nod01, …
bda2node06
When expanding a cluster from one to two racks, mammoth moves all critical services from node02 for the standby NameNode and node04 for the Standby ResourceManager of the
first rack to node01 and node02 of the second rack. node02 of the first rack becomes a
noncritical node. The Active NameNode continues to run on node01 of bda1 and the Active
ResourceManager continues to run on node03 of bda1.
Introduction to the Oracle Big Data Appliance - 42
If a server starts failing, you must take steps to maintain the services of the cluster with as little interruption as possible. You can use the bdacli utility to easily manage a failing server.
One of the management steps is called decommissioning. Decommissioning stops all roles
for all services, thereby preventing data loss. Cloudera Manager requires that you
decommission a CDH node before retiring it. When a noncritical node fails, there is no loss of service. However, when a critical node such as node01 fails in a CDH cluster, services with a
single point of failure are unavailable. You must decide between these alternatives:
1. Wait for repairs to be made, and endure the loss of service until they are complete.
2. Move the critical services to another node. This choice may require that some clients are
reconfigured with the address of the new node. For example, if the second
ResourceManager node (typically node03) fails, then users must redirect their browsers to
the new node to access Cloudera Manager.
Both the NameNodes and ResourceManagers in the Oracle BDA have HA and
automatic failover. If the first (active) NameNode fails or goes offline (such as a restart), then
The second (hot standby) NameNode automatically takes over to maintain the normal activities of the cluster. In our example, if the Active NameNode fails on node01 in bda1,
then the Standby NameNode on node01 in bda2 will take over and becomes the Active
NameNode.
Introduction to the Oracle Big Data Appliance - 43
If the second NameNode (or second ResourceManager) is already active, it continues without
a backup. With only one NameNode (or one ResourceManager), the cluster is vulnerable to
failure. The cluster has lost the redundancy needed for automatic failover. You can use the bdacli admin_cluster migrate command to restore HA with a single command.
In the first example, the command bdacli admin_cluster migrate bda1node01
restores HA with a single command. This command moves the services from the failing critical node01 to a noncritical node such as node10, and decommissions the failing server
in Cloudera Manager. You specify the name of the failing critical node such as bda1node01,
and the utility selects the noncritical node for the migration. When migration is complete, the
new node has all of the functionality of the original critical node.
Once the failing node/server is repaired or replaced, you can reprovision the server as a
noncritical node. Use the same name as the migrated node for node_name, such as bda1node01 as follows: bdacli admin_cluster reprovision bda1node01
When this command is competed, bda1node01 becomes a noncritical node.
Introduction to the Oracle Big Data Appliance - 44
In this lesson, you should have learned how to:
• Setup and configure the Oracle BDA
• Identify some of the hardware and software components of the Oracle BDA
Introduction to the Oracle Big Data Appliance - 45
In this course, we discussed Introduction to the Hadoop Ecosystem, Introduction to the
Oracle BDA, Oracle BDA Pre-Installation Steps, and Working With Mammoth.
You should now be able to:
• Define the Hadoop ecosystem and its components
• Describe Apache Hadoop Distributed File System (HDFS), MapReduce, Spark, and
YARN
• Complete the BDA Site Checklists
• Run the Oracle BDA Configuration Utility
• Install the Oracle BDA Mammoth software on the Oracle BDA
• Use, administer, and manage the Oracle Big Data Appliance (BDA)
Introduction to the Oracle Big Data Appliance - 46
The Oracle Learning Library offers other self-paced courses about Oracle Big Data Appliance.
Visit the Oracle Learning Library to learn about the courses.
Introduction to the Oracle Big Data Appliance - 47
Oracle University offers In-Class courses about Oracle Big Data and Oracle NoSQL
database. Visit Oracle University to learn about the following courses:
• Oracle Big Data Fundamentals
• Oracle NoSQL Database for Developers
• Oracle NoSQL Database for Administrators
Introduction to the Oracle Big Data Appliance - 48
The Oracle Learning Library offers many free demonstrations and tutorials.
And, of course, the Oracle Big Data Appliance documentation and online help embedded
within the product are also valuable resources.
Introduction to the Oracle Big Data Appliance - 49
Introduction to the Oracle Big Data Appliance - 50
Introduction to the Oracle Big Data Appliance - 51