deployment guide: ibm® biginsights™with ibm® spectrum scale

86
Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale™ and Ambari™ March 28, 2016 Version 1.2 Written for: Apache© Ambari V2.1 IBM BigInsights V4.1 IBM Open Platform with Apache Hadoop V4.1 IBM Spectrum Scale V4.1.1 and up

Upload: phamcong

Post on 14-Feb-2017

256 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale™ and Ambari™

March 28, 2016 Version 1.2

Written for:

Apache© Ambari V2.1 IBM BigInsights V4.1 IBM Open Platform with Apache Hadoop V4.1 IBM Spectrum Scale V4.1.1 and up

Page 2: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

2/86

Contents Contents ........................................................................................................................................................................... 2

Figures and Tables ............................................................................................................................................................ 5

1. Overview ................................................................................................................................................................... 6

1.1 Installation options .......................................................................................................................................... 6

1.2 Package download ........................................................................................................................................... 7

IBM Spectrum Scale Hadoop Connector .................................................................................................................. 7

IBM Spectrum Scale Ambari integration module ..................................................................................................... 7

1.3 A new cluster setup ......................................................................................................................................... 8

2. Known limitations ..................................................................................................................................................... 8

3. Preparing the environment .................................................................................................................................... 10

3.1 Validating the network .................................................................................................................................. 10

3.2 Set up password-less for root ........................................................................................................................ 10

3.3 Preparing the environment for IOP ............................................................................................................... 10

4. Dependencies ......................................................................................................................................................... 11

4.1 Software packages ......................................................................................................................................... 11

4.2 Kernel RPMs ................................................................................................................................................... 12

5. Set up the Yum repositories ................................................................................................................................... 13

5.1 Ambari and IOP mirror repositories ............................................................................................................... 13

5.2 The IBM Spectrum Scale Yum repository ....................................................................................................... 17

5.3 OS Repository ................................................................................................................................................. 20

6. Ambari installation ................................................................................................................................................. 21

6.1 Install the Ambari-Server RPM ....................................................................................................................... 21

6.2 Install the IBM Spectrum Scale Ambari integration module ......................................................................... 23

6.3 Setting up the Ambari server ......................................................................................................................... 26

6.4 Starting the Ambari server ............................................................................................................................. 27

7. Install IOP with IBM Spectrum Scale using Ambari................................................................................................. 28

7.1 Before you begin ............................................................................................................................................ 28

7.2 Ambari Wizard ............................................................................................................................................... 29

7.3 Create a cluster .............................................................................................................................................. 30

Welcome Screen ..................................................................................................................................................... 30

Cluster Name .......................................................................................................................................................... 31

Select Stack ............................................................................................................................................................. 31

Page 3: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

3/86

Install Options ......................................................................................................................................................... 32

Confirm Hosts ......................................................................................................................................................... 33

Choose Services ...................................................................................................................................................... 34

Assign Masters ........................................................................................................................................................ 34

Assign Slaves and Clients ........................................................................................................................................ 35

Customize Services ................................................................................................................................................. 36

7.4 Starting deployment ...................................................................................................................................... 44

Review .................................................................................................................................................................... 44

7.5 IBM Spectrum Scale deployment modes ....................................................................................................... 44

Deploy IOP over existing IBM Spectrum Scale file system (FPO) ............................................................................ 44

Deploy IOP over existing IBM Spectrum Scale file system (ESS)............................................................................. 45

Additional steps for deploying IOP over existing IBM Spectrum Scale file system – FPO or ESS ........................... 46

Deploy IOP over new IBM Spectrum Scale file system (FPO support only) ............................................................ 46

7.6 Verify and Test Installation ............................................................................................................................ 47

Appendix ......................................................................................................................................................................... 49

A. Preparing a stanza File ........................................................................................................................................ 49

B. IBM Spectrum Scale-FPO Deployment ............................................................................................................... 51

Disk-partitioning algorithm ..................................................................................................................................... 51

Failure Group selection rules .................................................................................................................................. 52

Rack Mapping File ................................................................................................................................................... 52

Partitioning Function Matrix in Automatic Deployment ........................................................................................ 53

C. Dual-network deployment .................................................................................................................................. 55

Two network adapters, configured with different sub-network addresses ........................................................... 55

Two network adapters, configured with same sub-network addresses ................................................................ 56

D. BigInsights Value Add Services on IBM Spectrum Scale ..................................................................................... 57

Troubleshooting Value Add Services ...................................................................................................................... 58

E. Node management ............................................................................................................................................. 60

Add Node ................................................................................................................................................................ 60

Remove Node ......................................................................................................................................................... 64

F. Upgrade IBM Spectrum Scale to Latest PTF ........................................................................................................ 69

G. Upgrade the IBM Spectrum Scale Ambari integration module ........................................................................... 70

H. IBM Spectrum Scale UI ....................................................................................................................................... 76

I. Collecting the snap data ..................................................................................................................................... 77

Page 4: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

4/86

J. HTTPS/REST API .................................................................................................................................................. 79

K. Resources ............................................................................................................................................................ 79

FAQ ................................................................................................................................................................................. 80

Notices ............................................................................................................................................................................ 83

Page 5: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

5/86

Figures and Tables

Figure 1 ambari login ...................................................................................................................................................... 30

Figure 2 ambari welcome page ...................................................................................................................................... 31

Figure 3 ambari cluster name ......................................................................................................................................... 31

Figure 4 ambari select stack ........................................................................................................................................... 32

Figure 5 ambari install options – host list ....................................................................................................................... 33

Figure 6 ambari confirm hosts ........................................................................................................................................ 33

Figure 7ambari choose services ...................................................................................................................................... 34

Figure 8 ambari assign masters ...................................................................................................................................... 35

Figure 9 ambari assign slaves and clients ....................................................................................................................... 36

Figure 10 ambari customize service iop tabs.................................................................................................................. 38

Figure 11 ambari IBM Spectrum Scale customize services standard and advanced settings ........................................ 39

Figure 12 ambari IBM Spectrum Scale custom services advance list ............................................................................. 40

Figure 13 ambari IBM Spectrum Scale data and metadata replicas ............................................................................... 41

Figure 15 ambari deployment review............................................................................................................................. 44

Figure 14ambari IBM Spectrum Scale hadoop local cache file stanza ........................................................................... 46

Figure 16 ambari nsd stanza ........................................................................................................................................... 51

Figure 17 ambari rack mapping ...................................................................................................................................... 53

Figure 18 ambari hosts panel ......................................................................................................................................... 64

Figure 19 ambari hosts gpfs node components ............................................................................................................. 65

Figure 20 ambari hosts actions ....................................................................................................................................... 66

Figure 21 ambari hosts actions delete host .................................................................................................................... 69

Figure 22 ambari upgrade IBM Spectrum Scale ............................................................................................................. 70

Figure 23 ambari dashboard actions stop all .................................................................................................................. 71

Figure 24 ambari dashboard add services ...................................................................................................................... 73

Figure 25 ambari upgrade choose services .................................................................................................................... 74

Figure 26 ambari add service wizard .............................................................................................................................. 74

Figure 27 ambari assign nodes - hadoop connector + IBM Spectrum Scale node ......................................................... 74

Figure 28 ambari customize services verification ........................................................................................................... 75

Figure 29 ambari review panel ....................................................................................................................................... 75

Figure 30 ambari after upgrade dashboard .................................................................................................................... 76

Figure 31 ambari collect snap data ................................................................................................................................. 78

Table 1 hadoop connector and ambari integration module ............................................................................................ 7

Table 2ambari and iop repository packages ................................................................................................................... 14

Table 3 IBM Spectrum Scale editions ............................................................................................................................. 18

Table 4 IBM Spectrum Scale checklist parameters ......................................................................................................... 37

Table 5IBM Spectrum Scale partitioning function matrix............................................................................................... 54

Page 6: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

6/86

1. Overview

This document describes the installation and the configuration of the IBM® BigInsights™ Open Platform with

Apache© Hadoop stack onto IBM® Spectrum Scale™ filesystem by using the Apache© Ambari framework.

The IBM Open Platform with Apache Hadoop (IOP) supports the Hadoop Distributed File System (HDFS). IBM

Spectrum Scale, formerly known as IBM General Parallel File System (IBM GPFS), is also supported for

customers who require advanced capabilities like a POSIX compliant file system, information lifecycle

management, incremental backups, high performance replication, and FIPS-140 / NIST complaint encryption.

With Ambari, the system administrator can provision, manage, and monitor a Hadoop cluster. Ambari can also

start and stop IBM Spectrum Scale services on all the nodes in the cluster and report the basic status

information through the Ambari web user interface (UI).

1.1 Installation options

You can install IBM Spectrum Scale in one of the following ways:

Install IBM Spectrum Scale as part of the Ambari IOP installation. With this method, a new IBM

Spectrum Scale File Placement Optimizer (FPO) is deployed and configured. IOP is then installed and

configured to use IBM Spectrum Scale instead of HDFS. This procedure is the basic installation based on

best practice policies for a big data cluster on IBM Spectrum Scale.

Install IBM Spectrum Scale manually before installing Ambari IOP.

During installation of IOP, the pre-created IBM Spectrum Scale filesystem is detected and only the Hadoop

integration components for IBM Spectrum Scale are deployed. The installer will install and configure Ha-

doop workload on top of the existing IBM Spectrum Scale without any validation checking on the pre-

existing IBM Spectrum Scale configuration. This installation could be an existing FPO or shared storage, such

as Elastic Storage Server (ESS) installation. This installation procedure can be used by advanced users.

Add a node to an existing Ambari IOP cluster.

Ambari adds nodes and installs IBM Spectrum Scale software onto the existing IBM Spectrum Scale

cluster, such as an ESS configuration, but does not create any Network Shared Disks (NSDs) or add

NSDs into the existing file system.

You can view the current best practices for installation here:

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20Sys

tem%20%28GPFS%29/page/References

In all cases, a local repository for IBM Spectrum Scale is required. Ambari reads from the repository to deploy

IBM Spectrum Scale, if it is not already created, and the following Hadoop integration components:

Module Description

Page 7: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

7/86

IBM Spectrum Scale Hadoop Connector

Provides an implementation of the Hadoop FileSystem API, thereby enabling Hadoop applications to use IBM Spectrum Scale as the distributed file system by using either the IBM Spectrum Scale (gpfs:///) or HDFS (hdfs:///) URI scheme.

IBM Spectrum Scale Ambari integration Module

Enables basic administration of IBM Spectrum Scale within the Ambari console. When installed, a IBM Spectrum Scale service appears in the Ambari interface instead of the standard HDFS service.

TABLE 1 HADOOP CONNECTOR AND AMBARI INTEGRATION MODULE

For a list of limitations, see “Known Limitations”.

1.2 Package download

IBM Spectrum Scale Hadoop Connector

The IBM Spectrum Scale Hadoop connector is independently installed from IBM Spectrum Scale and provided

as an RPM file. The Hadoop connector supports both IBM Spectrum Scale ESS and IBM Spectrum Scale FPO.

Download IBM Spectrum Scale Hadoop Connector from the IBM Spectrum Scale wiki here:

References: Hadoop Connector Download

The module name is hadoop-gpfs-connector-2.7.0-(version).

WARNING: Saving this package in /root/ can cause installation problems.

Note: There are two types of connectors: 1st Generation and 2ndGeneration. However, only the 1st Generation connector is supported and the 2nd Generation connector does not have any Ambari support. The 2nd Generation connector will be supported by the end of 2Q 2016. Support for the 2nd Generation connector will be announced on the wiki.

The Ambari-based installer attempts to detect a pre-existing Hadoop connector on each node. If one is found,

the installer does not overwrite or re-deploy the connector. If IBM Spectrum Scale Hadoop Connector is not

detected on any node, the installer deploys the IBM Spectrum Scale Hadoop Connector RPM file on all nodes

from the IBM Spectrum Scale repository.

IBM Spectrum Scale Ambari integration module

The IBM Spectrum Scale Ambari Integration Module is independent of IBM Spectrum Scale and is provided as a sep-

arate RPM file. The Hadoop connector supports both, IBM Spectrum Scale ESS and IBM Spectrum Scale FPO deploy-

ments.

Page 8: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

8/86

For traditional Hadoop clusters that use HDFS, an HDFS service appears in the Ambari console to provide a

graphical management interface for the HDFS configuration (hdfs-site.xml) and the Hadoop cluster itself (core-

site.xml). Through the Ambari HDFS Service, you can start and stop the HDFS service, make configuration

changes, and propagate those changes across the cluster.

IBM Spectrum Scale replaces HDFS, and the Ambari HDFS service is no longer used. The IBM Spectrum Scale

Ambari integration module, provided as an RPM, creates an Ambari IBM Spectrum Scale service to start and

stop IBM Spectrum Scale and make the configuration changes.

Download the IBM Spectrum Scale Ambari integration module from the IBM Spectrum Scale wiki here:

References: BigInsight Enterprise Manager

The module name is gpfs.ambari-iop_4.1-(version).

WARNING: Saving this package in /root/ can cause installation problems.

1.3 A new cluster setup

The installation process attempts to detect an existing IBM Spectrum Scale file system. For IBM Spectrum Scale

FPO, which is a multi-node, just-a-bunch-of-disk/JBOD configuration, the installer can set up a new clustered

file system if the hostnames of all the nodes and disk devices are available at each node via a stanza file. The

installer designates manager roles and quorum nodes and creates NSDs and the file system. The best practices

for the Hadoop configuration are automatically implemented.

2. Known limitations

The following are the known limitations and workarounds.

Note: This is an iterative document. For the latest version of this document, see the IBM Spectrum Scale

wiki, References: BigInsight Enterprise Manager.

Upgrading Ambari is not supported.

The WebHDFS protocol is not supported. As a workaround, the HTTPFS protocol can be used and the

HTTPFS setup is required if you want to use Big R.

Ambari File View does not work because it requires WebHDFS. The alternatives include:

SMB export for the Windows Explorer access

NFS export for the NFS clients

Page 9: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

9/86

A tool such as WinSCP that has the file browser capability and can be used to upload files

All of these exploit the POSIX compliance feature of IBM Spectrum Scale.

While deploying IOP over an existing IBM Spectrum Scale cluster, the IBM Spectrum Scale cluster must be started and the file system must be mounted on all the nodes before starting the Ambari deploy-ment.

If you cannot connect to the Ambari Server through the web browser, check to see if the following message is displayed in the Ambari Server log, which is located in /var/log/ambari-server:

WARN [main] AbstractConnector:335 - insufficient threads configured for [email protected]:8080

The size of the threadpool can be increased to match the number of CPUs on the node where the Am-bari Server is running.

For example, if you have 160 CPUs, add the following properties to /etc/ambari-server/conf/ambari.properties:

server.execution.scheduler.maxThreads=160 agent.threadpool.size.max=160 client.threadpool.size.max=160

Only IBM Spectrum Scale Version 4.1.1 and later and IBM Spectrum Scale Version 4.2 and earlier is supported. Ambari supports the automatic installation of IBM Spectrum Scale Version 4.1.1 and later.

After the IBM Spectrum Scale base packages are installed by Ambari, the IBM Spectrum Scale PTF packages must be upgraded manually. See Upgrade IBM Spectrum Scale to Latest PTF.

6. After adding and removing nodes from Ambari, some aspects of the IBM Spectrum Scale configuration, such as pagepool as seen by running the mmlsconfig command, are not refreshed until after the next restart of the IBM Spectrum Scale Ambari service. However, this does not impact the functionality.

7. For CentOS, create the /etc/redhat-release file to simulate a Redhat environment. Otherwise, the Am-

bari deployment will fail. For example: # cat redhat-release Red Hat Enterprise Linux Server release 6.6 (Santiago)

8. The Big SQL uninstallation fails when IBM Spectrum Scale is used. Work around:

Create the following symbolic link on all nodes:

# ln -s /var/lib/ambari-agent/cache/stacks/BigInsights/4.1.SpectrumScale/\

services/BIGSQL \

/var/lib/ambari-agent/cache/stacks/BigInsights/4.1/services/BIGSQL

Page 10: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

10/86

9. If Symphony is installed, the Symphony fix pack, sym-7.1-build391507, is required.

10. While adding a new node to a cluster on IBM Spectrum Scale with Ambari, you must add all the services. At-

tempts to add individual services such as Sqoop, BigSQL worker, after adding the node fail due to the de-

pendency on the HDFS client.

3. Preparing the environment

3.1 Validating the network

While using a private network for Hadoop data nodes, ensure that all nodes, including the management nodes,

have hostnames bound to the faster internal network.

On all nodes, the hostname -f must return the FQDN of the faster internal network. This network can be a

bonded network. If the nodes do not return the FQDN, modify /etc/sysconfig/network and use the hostname

command to change the FQDN of the node.

If the nodes in your cluster have two network adapters, see Dual Network Deployment.

3.2 Set up password-less for root

The IBM Spectrum Scale Master node is a special role designated to the node on which Ambari is installed and

issues IBM Spectrum Scale commands. Password-less SSH for root from the IBM Spectrum Scale master node

to all other IBM Spectrum Scale nodes must be configured.

In most cases, the IBM Spectrum Scale Master node is the Ambari server node.

If the IBM Spectrum Scale master node is the Ambari server node, passwordless SSH requirement is

concurrently met with when you perform the IOP pre-install tasks.

If the IBM Spectrum Scale master node is not the Ambari server node, then set up passwordless SSH from

the IBM Spectrum Scale Master node to all other nodes.

3.3 Preparing the environment for IOP

Page 11: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

11/86

Before installing IBM Open Platform (IOP), pre-installation tasks must be performed on each node. These pre-

installation tasks are the same for installing Hadoop with HDFS or IBM Spectrum Scale.

1. Perform the steps described at Preparing Your Environment: https://ibm.biz/BdHGfR

2. Pre-create Hadoop services IDs and groups according to https://ibm.biz/BdHGfX

If you are using LDAP, create the IDs and groups on the LDAP server and ensure that all nodes

can authenticate the users.

If you are using local IDs, the IDs must be pre-created on all nodes. While Ambari can create

service IDs and groups during installation, this is not optional for IBM Spectrum Scale

deployments because Ambari does not guarantee consistent UIDs and GIDs (see JIRA AMBARI-

10186). This is not critical for HDFS but for IBM Spectrum Scale because it is a kernel-level file

system.

For example:

groupadd --gid 1000 hadoop groupadd --gid 1016 rddcached #optionally align rddcached GID with UID useradd -g hadoop -u 1001 ams useradd -g hadoop -u 1002 hive useradd -g hadoop -u 1003 oozie useradd -g hadoop -u 1004 ambari-qa useradd -g hadoop -u 1005 flume useradd -g hadoop -u 1006 hdfs useradd -g hadoop -u 1007 solr useradd -g hadoop -u 1008 knox useradd -g hadoop -u 1009 spark useradd -g hadoop -u 1010 mapred useradd -g hadoop -u 1011 hbase useradd -g hadoop -u 1012 zookeeper useradd -g hadoop -u 1013 sqoop useradd -g hadoop -u 1014 yarn useradd -g hadoop -u 1015 hcat useradd -g rddcached -u 1016 rddcached #optionally align rddcached GID with UID useradd -g hadoop -u 1017 kafka

4. Dependencies

4.1 Software packages

Page 12: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

12/86

This section lists only the dependencies for Ambari and IBM Spectrum Scale. The dependencies for Hadoop have not

been listed in this section. The installer can access a RHEL repository from every node of the Hadoop cluster. Failure

in executing the yum install <RPM> command causes the overall installation process to fail.

The following packages must be installed on the Ambari server node:

postgresql

postgresql-server

postgresql-libs

The following packages must be installed on all IBM Spectrum Scale nodes:

ksh

libstdc++

libstdc++-devel

compat-libstdc++(only X86_64; not needed for ppc64/ppc64le)

kernel

kernel-devel

gcc-c++

libstdc++

imake(x86_64 only; not needed for ppc64/ppc64le)

make

The following recommended packages can be downloaded on all nodes:

acl, libacl – to enable Hadoop ACL support

libattr – to enable Hadoop extended attributes

Some of these packages are installed by default while installing the operating system.

4.2 Kernel RPMs

Check the kernel RPMs that are installed. Unlike HDFS, IBM Spectrum Scale is a kernel-level file system and as

such, integrates tightly with the operating system. This is a critical dependency. Ensure the environment has

matching kernel, kernel-devel, and kernel-headers.

1. On all nodes, check the kernel RPMs that are installed:

rpm -qa | grep kernel

2. On all nodes, confirm that the output includes the following:

kernel-headers

kernel-devel

Page 13: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

13/86

kernel

If any of the kernel RPMs are missing, install them. If the kernels already exist, run the yum install

commands.

yum -y install kernel-headers kernel-devel

3. Validate that all of the kernel RPM versions match.

WARNING: Kernels are updated after the original operating system installation. Ensure that the active

kernel version matches the installed version of both kernel-devel and kernel-headers.

[root@mn01-dat ~]# uname –r

3.10.0-327.el7.ppc64le<== Find kernel-devel and kernel-headers to match this

[root@mn01-dat ~]# rpm -qa | grep kernel

kernel-3.10.0-327.el7.ppc64le

kernel-tools-libs-3.10.0-327.el7.ppc64le

kernel-devel-3.10.0-327.el7.ppc64le<== kernel-devel matches

kernel-bootwrapper-3.10.0-327.el7.ppc64le

kernel-headers-3.10.0-327.el7.ppc64le<== kernel-headers matches

kernel-tools-3.10.0-327.el7.ppc64le

If multiple kernels are installed, check to ensure that only one instance of the kernel, kernel-header,

and kernel-devel are installed. If older kernel packages are installed, remove them.

5. Set up the Yum repositories

IBM Open Platform (IOP) and BigInsights support installation by reading from the IBM-hosted Yum repositories

or the local mirror repositories. Reading from the local mirror repositories is faster for multi-node clusters be-

cause each node performs their own download of repository code.

IBM Spectrum Scale only supports installation through a local repository.

5.1 Ambari and IOP mirror repositories

1. Obtain the appropriate tarballs based on the operating system of the cluster for the IBM Open Plat-

form repository and Ambari packages. Only the operating systems and the hardware listed in the re-

pository are supported.

Use either wget or curl -O to get the tarballs.

Page 14: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

14/86

Linux x86-64(RHEL6)

Ambari https://ibm-open-platform.ibm.com/repos/Ambari/rhel/6/x86_64/2.1.x/GA/2.1/ambari-2.1.0.0.el6.x86_64.tar.gz

IOP https://ibm-open-platform.ibm.com/repos/IOP/rhel/6/x86_64/4.1.x/GA/4.1.0.0/iop-4.1.0.0.el6.x86_64.tar.gz

IOP-Utils https://ibm-open-platform.ibm.com/repos/IOP-UTILS/rhel/6/x86_64/1.1/iop-utils-1.1.0.0.el6.x86_64.tar.gz

Linux x86-64(RHEL7)

Ambari https://ibm-open-platform.ibm.com/repos/Ambari/rhel/7/x86_64/2.1.x/GA/2.1/ambari-2.1.0.0.el7.x86_64.tar.gz

IOP https://ibm-open-platform.ibm.com/repos/IOP/rhel/7/x86_64/4.1.x/GA/4.1.0.0/iop-4.1.0.0.el7.x86_64.tar.gz

IOP-Utils https://ibm-open-platform.ibm.com/repos/IOP-UTILS/rhel/7/x86_64/1.1/iop-utils-1.1.0.0.el7.x86_64.tar.gz

Power Linux LE (RHEL7)

Ambari https://ibm-open-platform.ibm.com/repos/Ambari/rhel/7/ppc64le/2.1.x/GA/2.1/ambari-2.1.0.0.el7.ppc64le.tar.gz

IOP https://ibm-open-platform.ibm.com/repos/IOP/rhel/7/ppc64le/4.1.x/GA/4.1.0.0/iop-4.1.0.0.el7.ppc64le.tar.gz

IOP-Utils https://ibm-open-platform.ibm.com/repos/IOP-UTILS/rhel/7/ppc64le/1.1/iop-utils-1.1.0.0.el7.ppc64le.tar.gz

TABLE 2AMBARI AND IOP REPOSITORY PACKAGES

Note: If you are using a Windows system to download the files, you can also open the URLs in a web

browser and proceed to download the files. You can then transfer the files to the system that will host

the mirror repository files.

2. Select a server to act as the mirror repository server. This server requires the installation of the Apache

HTTP server or a similar HTTP server. Every node in the Hadoop cluster must be able to access this re-

pository server. This mirror server can be defined in the DNS or you can add an entry for the mirror

server in /etc/hosts on each node of the cluster.

Page 15: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

15/86

a) Create an HTTP server on the mirror repository server, such as Apache httpd. If Apache httpd is

not already installed, install it with the yum install httpd command. You can start Apache httpd

by performing the following steps:

apachectl start

or

service httpd start

Optional: Ensure the http server starts automatically on reboot:

chkconfig httpd on

b) Ensure that any firewall settings allow inbound HTTP access from your cluster nodes to your mir-

ror web server.

c) On the mirror repository server, create a directory for your repositories, such as <document

root>/repos. For Apache httpd with document root /var/www/html, type the following command:

mkdir -p /var/www/html/repos

d) Test your local repository by browsing to the web directory:

http://<Yum-Server>/repos

This example uses RHEL 7.1

# rpm -qa |grep httpd httpd-tools-2.4.6-31.el7_1.1.x86_64 httpd-2.4.6-31.el7_1.1.x86_64 # service httpd start # service httpd status Redirecting to /bin/systemctl status httpd.service httpd.service - The Apache HTTP Server Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled) Active: active (running) since Tue 2015-10-20 14:37:06 EDT; 2 weeks 2 days ago Process: 6270 ExecReload=/usr/sbin/httpd $OPTIONS -k graceful (code=exited, status=0/SUCCESS) Main PID: 3984 (httpd) Status: "Total requests: 0; Current requests/sec: 0; Current traffic: 0 B/sec" CGroup: /system.slice/httpd.service \u251c\u2500 3984 /usr/sbin/httpd -DFOREGROUND \u251c\u250014409 /usr/sbin/httpd -DFOREGROUND \u251c\u250019766 /usr/sbin/httpd -DFOREGROUND \u251c\u250021123 /usr/sbin/httpd -DFOREGROUND \u251c\u250021124 /usr/sbin/httpd -DFOREGROUND \u251c\u250021125 /usr/sbin/httpd -DFOREGROUND \u251c\u250027486 /usr/sbin/httpd -DFOREGROUND \u251c\u250040268 /usr/sbin/httpd -DFOREGROUND \u251c\u250041950 /usr/sbin/httpd -DFOREGROUND

Page 16: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

16/86

\u251c\u250054022 /usr/sbin/httpd -DFOREGROUND \u2514\u250059117 /usr/sbin/httpd -DFOREGROUND Oct 20 14:37:06 c902mnp08 httpd[3984]: [Tue Oct 20 14:37:06.347078 2015] [so:warn] [pid 3984] AH01574: module re-write_module is already loaded, skipping Oct 20 14:37:06 c902mnp08 httpd[3984]: AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.168.1.1. ...s message Oct 20 14:37:06 c902mnp08 systemd[1]: Started The Apache HTTP Server. Oct 25 03:44:01 c902mnp08 systemd[1]: Reloading The Apache HTTP Server. Oct 25 03:44:02 c902mnp08 httpd[50951]: [Sun Oct 25 03:44:02.021548 2015] [so:warn] [pid 50951] AH01574: module re-write_module is already loaded, skipping Oct 25 03:44:02 c902mnp08 systemd[1]: Reloaded The Apache HTTP Server. Nov 02 03:06:01 c902mnp08 systemd[1]: Reloading The Apache HTTP Server. Nov 02 03:06:01 c902mnp08 httpd[6270]: [Mon Nov 02 03:06:01.924506 2015] [so:warn] [pid 6270] AH01574: module re-write_module is already loaded, skipping Nov 02 03:06:02 c902mnp08 systemd[1]: Reloaded The Apache HTTP Server. Hint: Some lines were ellipsized, use -l to show in full. # systemctl enable httpd

3. Log in to the mirror repository server as root and extract the Ambari and IOP repository tarballs into

the repos directory under <document root> (For example: /var/www/html/repos).

For each of the tarballs downloaded in the previous step, run the following commands: cd /var/www/html/repos

tar xzvf <path to downloaded tarballs>

The result should be 3 subdirectories under /var/www/html/repos, one for each extracted tarball.

This example uses RHEL 7.1.

IOP # cd /var/www/html/repos

# tar xzvf iop-4.1.0.0.el7.x86_64.tar.gz

IOP/rhel/7/x86_64/4.1.x/GA/4.1.0.0/

IOP/rhel/7/x86_64/4.1.x/GA/4.1.0.0/kafka/

IOP/rhel/7/x86_64/4.1.x/GA/4.1.0.0/kafka/noarch/

IOP-UTILS # cd /var/www/html/repos

# tar xzvf iop-utils-1.1.0.0.el7.x86_64.tar.gz

IOP-UTILS/

IOP-UTILS/rhel/

IOP-UTILS/rhel/7/

IOP-UTILS/rhel/7/x86_64/

IOP-UTILS/rhel/7/x86_64/1.1/

...

Ambari # cd /var/www/html/repos

Page 17: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

17/86

# tar xzvf ambari-2.1.0.0.el7.x86_64.tar.gz

Ambari/rhel/7/x86_64/2.1x/GA/2.1/

Ambari/rhel/7/x86_64/2.1x/GA/2.1/ambari-server-2.1.0_IBM-4.x86_64.rpm

Ambari/rhel/7/x86_64/2.1x/GA/2.1/ambari-metrics-collector-2.1.0_IBM-

4.x86_64.rpm

Ambari/rhel/7/x86_64/2.1x/GA/2.1/ambari-metrics-monitor-2.1.0_IBM-

4.x86_64.rpm

Ambari/rhel/7/x86_64/2.1x/GA/2.1/ambari-metrics-hadoop-sink-2.1.0_IBM-

4.x86_64.rpm

Ambari/rhel/7/x86_64/2.1x/GA/2.1/ambari-agent-2.1.0_IBM-4.x86_64.rpm

Ambari/rhel/7/x86_64/2.1x/GA/2.1/ambari-log4j-2.1.0_IBM_8.noarch.rpm

Ambari/rhel/7/x86_64/2.1x/GA/2.1/repodata/

Amba-

ri/rhel/7/x86_64/2.1x/GA/2.1/repodata/652ba4ae68cb7da47520a2e5361e37d6aff4fc

e405693b0c9747b0f811bd0e3a-other.xml.gz

Ambari/rhel/7/x86_64/2.1x/GA/2.1/repodata/repomd.xml

Amba-

ri/rhel/7/x86_64/2.1x/GA/2.1/repodata/745c87f70592df43313a23a64359b9cff5adc8

a06738251e7ae8c584dff9b07f-primary.sqlite.bz2

Amba-

ri/rhel/7/x86_64/2.1x/GA/2.1/repodata/1fbb5a4f2cbe41039188203f3e58bd9b7c2c28

a3181b1af14a5992e9e7583c54-other.sqlite.bz2

Amba-

ri/rhel/7/x86_64/2.1x/GA/2.1/repodata/475de5547d77578f29f8fd3e3fdf98b56b4cff

40a59d717cdb147347df34ae3b-primary.xml.gz

Amba-

ri/rhel/7/x86_64/2.1x/GA/2.1/repodata/ebb308e260270ac8092fc942f6b97b359e3989

436fd1cb744b8efc9de6b08fb7-filelists.sqlite.bz2

Amba-

ri/rhel/7/x86_64/2.1x/GA/2.1/repodata/40aa5cfd67b38eaea2e77fd98945afccd58341

bf6c8183acd3d96f7f1a7ad745-filelists.xml.gz

Ambari/rhel/7/x86_64/2.1x/GA/2.1/ambari.repo

URLs for each Yum repository: IOP: http://<YUM-Server>/repos/IOP/rhel/7/x86_64/4.1.x/GA/4.1.0.0/ IOP-UTILS:http://<YUM-Server>/repos/IOP-UTILS/rhel/7/x86_64/1.1/ Ambari:http://<YUM-Server>/repos/Ambari/RHEL7/x86_64/2.1.0/

5.2 The IBM Spectrum Scale Yum repository

Note: If you have already set up an IBM Spectrum Scale file system, you can skip this section.

The following instructions are written for customers deploying IBM Open Platform (IOP) with IBM Spectrum

Scale Advanced Edition, the version that is included with BigInsights Enterprise Management. If you are using

Ambari to install IBM Spectrum Scale, use the Standard or Advanced Edition of IBM Spectrum Scale.

IBM Spectrum Scale Express Edition can be used only if it is installed and configured manually before installing

Ambar and IOP. The following list of RPM packages for IBM Spectrum Scale v4.1.1 can help verify the edition of

IBM Spectrum Scale.

Page 18: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

18/86

IBM Spectrum Scale Edition rpm package list

Express Edition gpfs.base

gpfs.gpl

gpfs.docs

gpfs.gskit

gpfs.msg.en_US

gpfs.platform

Standard Edition <Express Edition rpm list> + gpfs.ext

Advanced Edition <Standard Edition rpm list> + gpfs.crypto

For IBM Spectrum Scale 4.2 release:

Add gpfs.adv to list above

TABLE 3 IBM SPECTRUM SCALE EDITIONS

When you purchase the IBM Spectrum Scale license, get an account for the Passport Advantage Website to download the IBM Spectrum Scale packages.

For internal IBM users, follow the instructions at Software Sellers Workplace.

For customer POC or trial licenses, send an email to [email protected].

Example uses IBM Spectrum Scale 4.1.1 version.

1. On the repository web server, create a directory for your IBM Spectrum Scale repos, such as <document

root>/repos/GPFS. For Apache httpd with document root /var/www/html, type the following command:

mkdir -p /var/www/html/repos/GPFS

2. Obtain the IBM Spectrum Scale Software. If you have already installed IBM Spectrum Scale manually, skip

this step. In Passport Advantage, locate and download the software package titled:

IBM Spectrum Scale Advanced 4.1.1

Depending on your license entitlement, the package might be available as part of IBM BigInsights

Enterprise Management Version 4.1 or IBM BigInsights for Apache Hadoop Version 4.1 rather than as a

stand-alone IBM Spectrum Scale package.

Download the file IBM_SPECTRUM_SCALE_ADVLx86_4.1.1.tar.gz (x86_64) or

IBM_SPECTRUM_SCALE_ADV_LxP8_4.1.1.tar.gz (ppc64le), then extract the installer.

As root or a user with sudo privileges:

Page 19: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

19/86

tar zxvf IBM_SPECTRUM_SCALE_ADVLx86_4.1.1.tar.gz

chmod +x Spectrum_Scale_install-4.1.1.0_x86_64_advanced

./Spectrum_Scale_install-4.1.1.0_x86_64_advanced --dir /var/www/html/repos/GPFS--silent

Note: The --silent option is used to accept the Software License Agreement and the --dir option places the

IBM Spectrum Scale RPMs into the directory/var/www/html/repos/GPFS. Without specifying the --dir

optionthe default location will be /usr/lpp/mmfs/4.1.1.

3. If the RPM was extracted into the IBM Spectrum Scale default directory, /usr/lpp/mmfs/4.1.1, copy all the

IBM Spectrum Scale RPM files into the IBM Spectrum Scale repository path:

cd /usr/lpp/mmfs/4.1.1

cp gpfs*.rpm /var/www/html/repos/GPFS

4. Ensure the directory does not contain any optional IBM Spectrum Scale rpm packages. See IBM Spectrum

Scale Installation Guide for the specific version for more information on base and optional packages.

Ambari and IOP require only the following packages:

gpfs.base

gpfs.gpl

gpfs.docs

gpfs.gskit

gpfs.msg.en_US

gpfs.ext

gpfs.crypto (if Advanced edition is used)

gpfs.adv (if IBM Spectrum Scale 4.2 Advanced edition is used)

5. Delete the gpfs.hadoop-connector RPM from /var/www/html/repos/GPFS.

cd/var/www/html/repos/GPFS

rm gpfs.hadoop-connector*.rpm

Use the newest connector packages downloaded from IBM Spectrum Scale Hadoop Connector.

6. Copy the IBM Spectrum Scale Hadoop Connector RPM to the IBM Spectrum Scale repo path.

Page 20: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

20/86

cp gpfs.hadoop-connector-2.7.0-(version) /var/www/html/repos/GPFS

WARNING: If you want to apply upgrade IBM Spectrum Scale from GA level code, do not put the PTF up-date packages into the IBM Spectrum Scale repo package PATH. The PTF update packages must be in-stalled separately after the cluster has been installed and configured.

7. Check for IBM Spectrum Scale RPMs in the /root/ directory. If the RPMs exist, relocate them to a subdirec-tory. There are known issues with IBM Spectrum Scale RPMS in the /root that cause the Ambari installation to fail .

8. Create the YUM repository

createrepo /var/www/html/repos/GPFS/

# cd /var/www/html/repos/GPFS/ #createrepo . Spawning worker 0 with 8 pkgs Workers Finished Gathering worker results Saving Primary metadata Saving file lists metadata Saving other metadata Generating sqlite DBs Sqlite DBs complete

9. Access the repository at http://<YUM-Server>/repos/GPFS.

5.3 OS Repository

Because some of the IBM Spectrum Scale RPMs have dependencies on all nodes, you must also create the op-erating system repository.

1. Create the repository path:

mkdir /var/www/html/repos/<rhel_OSlevel>

2. Synchronize each local directory with the current yum repository:

cd /var/www/html/repos/<rhel_OSlevel>

Run the following:

reposync --gpgcheck -l --repoid=rhel-7-server-rpms --download_path=/var/www/html/repos/<rhel_OSlevel>

3. Create the repository for this node:

createrepo -v /var/www/html/repos/<rhel_OSlevel>

Page 21: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

21/86

4. Ensure that all the firewalls are disabled or you have the httpd service port open, because Yum uses http to get the packages from the repository.

5. On all nodes in the cluster that require the repositories, create a file in

/etc/yum.repos.d called local_<rhel_OSlevel>.repo

6. Copy this file to all nodes. The contents of this file must look like the following:

[local_rhel7.1] name=local_rhel7.1 enabled=yes baseurl=http://<internal IP that all nodes can reach>/repos/<rhel_OSlevel> gpgcheck=no

7. Run the yum repolist and yum install RPMs without external connections.

6. Ambari installation

6.1 Install the Ambari-Server RPM

1. Log on to the Ambari server and create the Ambari YUM repo file, ambari.repo. In this example the Amba-

ri-server is compute000.

Replace this hostname with your Ambari-server hostname and use the appropriate value for baseurl for

the local repository previously configured.

Perform this step on the Ambari-server only.

WARNING: Verify if http:// or https:// is functioning for your repository.

[root@smn GPFS]#ssh compute000 [root@compute000 ~]# cat /etc/yum.repos.d/ambari.repo [BI_AMBARI-2.1.0] name=ambari-2.1.0 baseurl=http://<Yum-Server>/repos/Ambari/rhel/7/x86_64/2.1.x/GA/2.1 enabled=1 gpgcheck=0 [root@compute000 ~]# yum clean all [root@compute000 ~]#yum makecache

Page 22: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

22/86

2. Use yum to install the ambari-server rpm:

yum -y install ambari-server

[root@compute000 ~]#yum -y install ambari-server Loaded plugins: product-id, subscription-manager This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register. BI_AMBARI-2.1.0 | 2.9 kB 00:00:00 xCAT-rhels7.1-path0 | 4.1 kB 00:00:00 xcat-otherpkgs0 | 2.9 kB 00:00:00 Resolving Dependencies --> Running transaction check ---> Package ambari-server.x86 0:2.1.0_IBM-3 will be installed --> Processing Dependency: postgresql-server >= 8.1 for package: ambari-server-2.1.0_IBM-3.x86 --> Running transaction check ---> Package postgresql-server.x86 0:9.2.7-1.ael7b will be installed --> Processing Dependency: postgresql(ppc-64) = 9.2.7-1.ael7b for package: postgresql-server-9.2.7-1.ael7b.x86 --> Processing Dependency: postgresql-libs(ppc-64) = 9.2.7-1.ael7b for package: postgresql-server-9.2.7-1.ael7b.x86 --> Processing Dependency: libpq.so.5()(64bit) for package: postgresql-server-9.2.7-1.ael7b.x86 --> Running transaction check ---> Package postgresql.x86 0:9.2.7-1.ael7b will be installed ---> Package postgresql-libs.x86 0:9.2.7-1.ael7b will be installed --> Finished Dependency Resolution Dependencies Resolved ============================================================================================================================================================= Package Arch Version Repository Size ============================================================================================================================================================= Installing: ambari-server x86 2.1.0_IBM-3 BI_AMBARI-2.1.0-20150802_1728 341 M Installing for dependencies: postgresql x86 9.2.7-1.ael7b xCAT-rhels7.1-path0 3.0 M postgresql-libs x86 9.2.7-1.ael7b xCAT-rhels7.1-path0 241 k postgresql-server x86 9.2.7-1.ael7b xCAT-rhels7.1-path0 4.1 M Transaction Summary ============================================================================================================================================================= Install 1 Package (+3 Dependent packages) Total download size: 348 M Installed size: 404 M Downloading packages: (1/4): postgresql-libs-9.2.7-1.ael7b.x86.rpm | 241 kB 00:00:00 (2/4): postgresql-9.2.7-1.ael7b.x86.rpm | 3.0 MB 00:00:00 (3/4): postgresql-server-9.2.7-1.ael7b.x86.rpm | 4.1 MB 00:00:00 (4/4): ambari-server-2.1.0_IBM-3.x86.rpm | 341 MB 00:00:05 ------------------------------------------------------------------------------------------------------------------------------------------------------------- Total 59 MB/s | 348 MB 00:00:05 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : postgresql-libs-9.2.7-1.ael7b.x86 1/4 Installing : postgresql-9.2.7-1.ael7b.x86 2/4 Installing : postgresql-server-9.2.7-1.ael7b.x86 3/4 Installing : ambari-server-2.1.0_IBM-3.x86 4/4

Page 23: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

23/86

Verifying : postgresql-libs-9.2.7-1.ael7b.x86 1/4 Verifying : postgresql-9.2.7-1.ael7b.x86 2/4 Verifying : postgresql-server-9.2.7-1.ael7b.x86 3/4 Verifying : ambari-server-2.1.0_IBM-3.x86 4/4 Installed: ambari-server.x86 0:2.1.0_IBM-3 Dependency Installed: postgresql.x86 0:9.2.7-1.ael7b postgresql-libs.x86 0:9.2.7-1.ael7b postgresql-server.x86 0:9.2.7-1.ael7b Complete!

6.2 Install the IBM Spectrum Scale Ambari integration module

1. Install the gpfs.ambari integration module onto the Ambari server node:

chmod755gpfs.ambari-iop-<version>.noarch.bin

./gpfs.ambari-iop-<version>.noarch.bin

Note: To avoid potential problems, do not put the gpfs-ambari integration package in /root/.

[root@compute000 ~]#chmod 755gpfs.ambari-iop_4.1-2.3.noarch.bin [root@compute000 ~]#./gpfs.ambari-iop_4.1-2.3.noarch.bin International License Agreement for Non-Warranted Programs Part 1 - General Terms BY DOWNLOADING, INSTALLING, COPYING, ACCESSING, CLICKING ON AN "ACCEPT" BUTTON, OR OTHERWISE USING THE PRO-GRAM, LICENSEE AGREES TO THE TERMS OF THIS AGREEMEN T. IF YOU ARE ACCEPTING THESE TERMS ON BEHALF OF LICENSEE, YOU REPRESENT AND WARRANT THAT YOU HAVE FULL AU-THORITY TO BIND LICENSEE TO THESE TERMS. IF YOU DO NOT AGREE TO THESE TERMS, * DO NOT DOWNLOAD, INSTALL, COPY, ACCESS, CLICK ON AN "ACCEPT" BUTTON, OR USE THE PROGRAM; AND * PROMPTLY RETURN THE UNUSED MEDIA AND DOCUMENTATION TO THE PARTY FROM WHOM IT WAS OBTAINED FOR A RE-FUND OF THE AMOUNT PAID. IF THE PROGRAM WAS DOWNLOADED, D ESTROY ALL COPIES OF THE PROGRAM. …... Z125-5589-05 (07/2011) Do you agree to the above license terms? [yes or no] yes Unpacking... Done Installing... Preparing... ################################# [100%] Updating / installing... 1:gpfs.ambari-iop_4.1-2.3 ################################# [100%]

Page 24: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

24/86

2. Update the Ambari configuration file to use the local repository.

If a cloned local Yum repository is used, the Ambari configuration file must be updated before setting up the Ambari server.

Update the value of openjdk1.8.url and openjdk1.7.url in /etc/ambari-server/conf/ambari.properties.

Instead of ibm-open-platform.ibm.com, specify the hostname of your local repository server. Also check

the protocol type (http vs. https).

[root@compute000 scripts]# cat /etc/ambari-server/conf/ambari.properties |grep ibm-open openjdk1.8.url=http://<Yum server>/repos/IOP-UTILS/RHEL7/x86/1.1/openjdk/jdk-1.8.0.tar.gz openjdk1.7.url=http://<Yum server>/repos/IOP-UTILS/RHEL7/x86/1.1/openjdk/jdk-1.7.0.tar.gz

3. Update the number of threads used by Ambari-server and Ambari-agent to match the number of cpus on

the nodes. In this example, the value is updated to 160:

[root@compute001 ~]# nproc

160

[root@compute001 ~]# grep -i thread /etc/ambari-server/conf/ambari.properties

agent.threadpool.size.max=160

server.execution.scheduler.maxThreads=160

client.threadpool.size.max=160

4. Update the Spark params.py file in the Ambari server stack definition so that the Spark history services can

be started.

The parameter, spark_eventlog_dir_mode, is 01777 by default which will cause permission issues when

starting Spark History Service. The workaround is to change the value to 0777. This change can be made at

any time before or after the initial deployment.

Note: If you have already set up Ambari, restart the Ambari server after making this change.

vim /var/lib/ambari-server/resources/stacks/BigInsights/4.1/services/SPARK/package/scripts/params.py

70 spark_hdfs_user_dir = format("/user/{spark_user}") 71 spark_hdfs_user_mode = 0755 72 spark_eventlog_dir_mode = 0777 73 spark_jar_hdfs_dir = "/iop/apps/4.1.0.0/spark/jars" 74 spark_jar_hdfs_dir_mode = 0755 75 spark_jar_file_mode = 0444 76 spark_jar_src_dir = "/usr/iop/current/spark-client/lib" 77 spark_jar_src_file = "spark-assembly.jar"

Page 25: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

25/86

5. Update the hive.py file in the Ambari server stack definition to change the permission of Hive’s data ware-house directory. The data warehouse directory is specified as hive.metastore.warehouse.dir. The default directory is /apps/hive/warehouse. When starting the Hive service, the permission of this directory is reset to770 (rwx rwx ---) and the directory is owned by hive:hadoop. Therefore, other users from other groups cannot ac-cess the directory and thus cannot create any hive database or table under the warehouse. Make the following change to allow other users to be able to create hive database or tables: vim /var/lib/ambari-server/resources/stacks/BigInsights/4.0/services/HIVE/package/scripts/hive.py

171 params.HdfsResource(params.hive_apps_whs_dir,

172 type="directory",

173 action="create_on_execute",

174 owner=params.hive_user,

175 group=params.user_group,

176 mode=0777

177 )

This change can be done after the initial deployment and the Ambari server must be restarted to make it effective.

6. Update the Yum repository file:

/var/lib/ambari-server/resources/stacks/BigInsights/4.1.SpectrumScale/repos/repoinfo.xml

You can change the repo info in the Ambari GUI later, but those changes will not be saved into the repoin-fo.xml file. This happens because at the next restart of the Ambari-server, Ambari checks the repoinfo.xml that has been updated and uploads the file into the database. Therefore, the previous changes are lost.

[root@compute000 ~]# cat /var/lib/ambari-server/resources/stacks/BigInsights/4.1.SpectrumScale/repos/repoinfo.xml <?xml version="1.0"?> <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <reposinfo> <mainrepoid>IOP-4.1</mainrepoid> <os family="redhat6"> <repo> <baseurl>http://<Yum server>/repos/IOP/RHEL6/x86_64/4.1</baseurl> <repoid>IOP-4.1</repoid> <reponame>IOP</reponame>

Page 26: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

26/86

</repo> <repo> <baseurl>http://<Yum server>/repos/IOP-UTILS/RHEL6/ppc64le/1.1</baseurl> <repoid>IOP-UTILS-1.0</repoid> <reponame>IOP-UTILS</reponame> </repo> </os> </reposinfo>

6.3 Setting up the Ambari server

Run the setup command to configure your Ambari Server, Database, JDK, LDAP, and other options:

ambari-server setup

[root@compute000 ~]#ambari-server setup Using python /usr/bin/python2.7 Setup ambari-server Checking SELinux... SELinux status is 'disabled' Customize user account for ambari-server daemon [y/n] (n)? n Adjusting ambari-server permissions and ownership... Checking firewall status... Redirecting to /bin/systemctl status iptables.service Checking JDK... [1] OpenJDK 1.8.0 [2] OpenJDK 1.7.0 (deprecated) [3] Custom JDK ============================================================================== Enter choice (1): 1 <==== Note: JDK 1.8.0 or greater is required to run Spark applications with Platform Symphony on the ppc64le platform Downloading JDK from http://<Yum server>/repo/IOP-UTILS_1.1/openjdk/jdk-1.8.0.tar.gz to /var/lib/ambari-server/resources/jdk-1.8.0.tar.gz jdk-1.8.0.tar.gz... 100% (48.3 MB of 48.3 MB) Successfully downloaded JDK distribution to /var/lib/ambari-server/resources/jdk-1.8.0.tar.gz Installing JDK to /usr/jdk64/ Successfully installed JDK to /usr/jdk64/ Completing setup... Configuring database... Enter advanced database configuration [y/n] (n)? n Configuring database... Default properties detected. Using built-in database. Configuring ambari database... Checking PostgreSQL... Running initdb: This may take upto a minute. Initializing database ... OK

Page 27: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

27/86

About to start PostgreSQL Configuring local database... Connecting to local database...done. Configuring PostgreSQL... Restarting PostgreSQL Extracting system views... ....ambari-admin-2.1.0_IBM.jar .. Adjusting ambari-server permissions and ownership... Ambari Server 'setup' completed successfully.

6.4 Starting the Ambari server

Ambari server uses Port 8080 by default. If there are any other services that using this port, another

port can be assigned to Ambari. To change the default port of Ambari, change or add the following

line in /etc/ambari-server/conf/ambari.properties:

client.api.port=<port_number>

The port number can be changed later with ambari-server restart after adding the port you want to

the /etc/ambari-server/conf/ambari.properties file.

Optionally, PostgreSQL is used by the Ambari server to store the cluster configuration information.

Ensure that it restarts after reboot:

chkconfig postgresql on

Then, start Ambari server:

ambari-server start

[root@compute000 ~]# ambari-server start Using python /usr/bin/python2.7 Starting ambari-server Ambari Server running with administrator privileges. Organizing resource files at /var/lib/ambari-server/resources... Server PID at: /var/run/ambari-server/ambari-server.pid Server out at: /var/log/ambari-server/ambari-server.out Server log at: /var/log/ambari-server/ambari-server.log Waiting for server start....................

Page 28: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

28/86

Ambari Server 'start' completed successfully.

7. Install IOP with IBM Spectrum Scale using Ambari

7.1 Before you begin

Set up passwordless ssh access for root.

Before the installation, configure the root passwordless access from the IBM Spectrum Scale master node to all other IBM Spectrum Scale nodes. This is required for IBM Spectrum Scale.

The following steps are for passwordless access for root :

a. Define Node1 as the IBM Spectrum Scale master.

b. Log on to Node1 as the root user.

# cd /root/.ssh

c. Generate a pair of public authentication keys. Do not type a passphrase.

# ssh-keygen -t rsa Generating the public-private rsa key pair. Type the name of the file in which you want to save the key (/root/.ssh/id_rsa): Type the passphrase. Type the same passphrase again. The identification has been saved in /root/.ssh/id_rsa. The public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is:

Note: During ssh-keygen -t rsa, accept the default for all.

d. Set the public key into authorized_keys file

# cd /root/.ssh/; cat id_rsa.pub > authorized_keys

e. Copy the generated public key file to nodeX # scp /root/.ssh/* root@nodeX :/root/.ssh

f. Make sure public key file permission is correct # ssh root@nodeX “chmod 700 .ssh; chmod 640 .ssh/authorized_keys"

g. Check passwordless access

# ssh node2

[root@node1 ~]# ssh node2

The authenticity of host 'gpfstest9 (192.168.10.9)' can't be established.

RSA key fingerprint is 03:bc:35:34:8c:7f:bc:ed:90:33:1f:32:21:48:06:db.

Are you sure you want to continue connecting (yes/no)?yes

Page 29: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

29/86

Note: You also need to run “ssh node1” to add the key into /root/.ssh/known_hosts for pass-wordless access.

If you have pre-installed a Spectrum Scale file system:

a. Ensure that IBM Spectrum Scale is set to automount on reboot

mmchfs<device> -A yes

b. Ensure that the IBM Spectrum Scale cluster is started on all nodes

/usr/lpp/mmfs/bin/mmstartup -a

c. Ensure that the IBM Spectrum Scale filesystem is mounted on all nodes

mmmount<fs-name> -a

d. Ensure that no IBM Spectrum Scale NSD stanza file called gpfs_nsd (default expected name) exists un-

der /var/lib/ambari-server/resources/ on the Ambari server node

e. If ESS is used as the shared storage in an Ambari cluster, create a shared node information file for

the ESS cluster called

/var/lib/ambari-server/resources/shared_gpfs_node.cfg

on the Ambari server. This file must contain only one hostname which is the hostname of the node

in the ESS cluster. Ambari uses this one node to join the ESS cluster. Password-less SSH must be

configured from the Ambari server to this node.

[root@compute000 scripts]# cat /var/lib/ambari-server/resources/shared_gpfs_node.cfg compute002

If you are using Ambari to deploy a new IBM Spectrum Scale FPO file system, prepare a GPFS NSD Stanza

file as described in Preparing a stanza File and add it to /var/lib/ambari-server/resources/ on the Ambari

server node. The default file name is gpfs_nsd.

Because there are likely repository changes in the environment, clear all the yum packages and headers

from the cache on all nodes.

yum clean all; yum makecache

7.2 Ambari Wizard

Open a browser (Firefox or Internet Explorer) to log on to the Ambari administrator console at

Page 30: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

30/86

http://<ambari-server-host-name>:8080.

Platform Cluster Manager (PCM) uses Port 8080. Therefore, IBM customers using both PCM and Am-

bari on the same host require a port change. The ISH and IDEA solutions have this configuration.

If you do not see the Ambari console, the default client port (8080) might have been changed by set-

ting client.api.port in /etc/ambari-server/conf/ambari.properties.

The default Ambari account is admin:admin.

FIGURE 1 AMBARI LOGIN

7.3 Create a cluster

Welcome Screen Click Create a Cluster > Launch Install Wizard.

Page 31: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

31/86

FIGURE 2 AMBARI WELCOME PAGE

Cluster Name Type a name for the cluster

FIGURE 3 AMBARI CLUSTER NAME

Select Stack Click Next and the BigInsights stacks now appear on the Select Stack page.

Stack Name Description

BigInsights 4.1 IBM Spectrum Scale Installs BigInsights and IBM Spectrum Scale at the same time

BigInsights 4.1 Installs BigInsights with HDFS

Note: Expand the Advanced Repository Options section to review the repository settings. Ensure that

the local mirror repository configured is correct.

If you are installing on RHEL6, you can uncheck the repository information for RHEL7. If you

are installing RHEL7, you can uncheck repository information for RHEL6.

Validate the base URLs for all the local repositories: IBM Spectrum Scale, IOP, and IOP-UTILS.

Note: If you want to use the public BigInsight IOP 4.1 repository, such as http://ibm-open-

Page 32: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

32/86

platform.ibm.com/repos/IOP/rhel/6/x86_64/4.1.x/GA/4.1.0.0/, and IOP-UTILS 1.1, such as http://ibm-open-platform.ibm.com/repos/IOP-UTILS/rhel/6/x86_64/1.1/, ensure that all the nodes in the cluster can access the internet. In this mode, installation might take more time because it needs to download all the RPM packages during installation.

FIGURE 4 AMBARI SELECT STACK

Install Options On the Install Options screen, in the Target Hosts section, type the list of fully qualified domain names

(FQDNs) of the nodes in the cluster. For SSH Private Key, upload or copy and paste the key from

/root/.ssh/id_rsa from the Ambari server.

Page 33: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

33/86

FIGURE 5 AMBARI INSTALL OPTIONS – HOST LIST

Confirm Hosts On the Confirm Hosts screen, click the Register and Confirm button. Ambari installs agents and the

selected software onto all the specified nodes and do some basic verification checks. Note that

because you pre-created the entire set of IOP user IDs, the system displays a warning that the IDs

already exists. However, fix all the errors to ensure that the pre-requisites have been met.

FIGURE 6 AMBARI CONFIRM HOSTS

Page 34: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

34/86

Choose Services On the Choose Services screen, select the services to be installed.

Note: It is possible to select only IBM Spectrum Scale and to leave all other services unchecked for the

purposes of using Ambari to deploy a general purpose IBM Spectrum Scale cluster. However, the

configuration settings for IBM Spectrum Scale will be Hadoop-centric and might not be appropriate for

other types of workloads. Any services not selected for installation.

FIGURE 7AMBARI CHOOSE SERVICES

Assign Masters On the Assign Masters screen, services that belong on a master / management / edge node are

presented. Select the location where you want to deploy the master node services. Note the IBM

Spectrum Scale Master and consider its placement.

Page 35: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

35/86

FIGURE 8 AMBARI ASSIGN MASTERS

The IBM Spectrum Scale Master Node designates the node from where Ambari issues commands

affecting the entire cluster. For example, when IBM Spectrum Scale is first being installed and an FPO

cluster is first being created, the commands are all executed on the IBM Spectrum Scale Master Node.

This is where you’ll want to ensure passwordless SSH is set up to every node. As another example, if

the configuration changes are made after the cluster has been deployed; the IBM Spectrum Scale

Master Node executes the commands to reconfigure the cluster and, if necessary, restarts IBM

Spectrum Scale on all nodes. The term Master is used to follow the convention used by the other

Hadoop services. The IBM Spectrum Scale master node has no special role in the IBM Spectrum Scale

cluster, other than being one of the quorum nodes.

Important: If the IBM Spectrum Scale cluster has been created, a quorum node must be selected as the

IBM Spectrum Scale master node.

There are recommendations for assigning node roles on the BigInsights Knowledge Center at

http://www.ibm.com/support/knowledgecenter/SSPT3X/SSPT3X_welcome.html

Assign Slaves and Clients On the Assign Slaves and Clients screen, select the client and slave components that are to be

deployed across the data nodes.

Important:

If you want to install Big SQL, install all clients on all nodes including the management nodes.

For the IBM Spectrum Scale node client, select all the nodes including all management nodes.

They will be selected on each node by default. This is to run both, the IBM Spectrum Scale

node and IBM Spectrum Scale Hadoop Connector on every node.

Page 36: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

36/86

FIGURE 9 AMBARI ASSIGN SLAVES AND CLIENTS

Customize Services On the Customize Services screen, IBM Spectrum Scale has its own tab that must be reviewed

carefully. There are two tabs: Standard and Advanced configuration.

If a new IBM Spectrum Scale cluster is being created, configuration fields on both tabs are pop-

ulated with values taken from the IBM Spectrum Scale for Hadoop Best Practices White Paper

at References: Deploy BigInsights IOP 4.1 over IBM Spectrum Scale

In the Standard tab, users can adjust parameters via slider bars or drop-down menus. The Ad-

vanced tab contains parameters that do not need to be changed frequently.

IMPORTANT: Read and follow the IBM Spectrum Scale deployment modes section FIRST to know how each mode of IBM Spectrum Scale you are deploying IOP onto would affect the system and the Standard and Advanced tabs in the Ambari wizard.

Page 37: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

37/86

Here are important IBM Spectrum Scale parameters checklists:

Standard tab Rule Advanced tab Rule

Cluster Name Advanced core-site: fs.defaultFS Make sure ‘hdfs://localhost:8020 is used

FileSystem Name Advanced gpfs-advance: gpfs.quorum.nodes

The node number must be odd

FileSystem Mount Point

NSD stanza file See guide in Deploy IOP over new IBM Spectrum Scale file system (FPO support only)

Policy file See guide in Deploy IOP over new IBM Spectrum Scale file system (FPO support only)

Hadoop local cache disk stanza file

See guide in Deploy IOP over existing IBM Spectrum Scale file system (ESS)

Default Metadata Replicas <= Max Metadata Rep-licas

Default Data Replicas <= Max Data Replicas

Max Metadata Replicas

Max Data Replicas

TABLE 4 IBM SPECTRUM SCALE CHECKLIST PARAMETERS

Page 38: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

38/86

FIGURE 10 AMBARI CUSTOMIZE SERVICE IOP TABS

Note:

Check IBM Spectrum Scale values and all the services with a red circle . The red circle denotes mandatory en-tries before the service can be deployed.

Page 39: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

39/86

FIGURE 11 AMBARI IBM SPECTRUM SCALE CUSTOMIZE SERVICES STANDARD AND ADVANCED SETTINGS

Page 40: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

40/86

FIGURE 12 AMBARI IBM SPECTRUM SCALE CUSTOM SERVICES ADVANCE LIST

Verify the configuration for IBM Spectrum Scale service.

If you have already created the IBM Spectrum Scale cluster and are using Ambari to deploy IOP and

Hadoop integration modules, IBM Spectrum Scale Hadoop Connector and IBM Spectrum Scale Ambari

Module, the fields are populated by using values detected from the existing cluster.

The parameters with a lock icon must not be changed after deployment. These include parameters like

the cluster name, remote shell, filesystem name, and max data replicas. Therefore, double check all the

parameters with the lock icon before proceeding to the next step. Further, while every attempt is

made to detect the correct values from the cluster, verify that parameters are imported properly and

make corrections as needed.

Page 41: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

41/86

FIGURE 13 AMBARI IBM SPECTRUM SCALE DATA AND METADATA REPLICAS

The review parameters for Max Data Replicas and Max Metadata Replicas as these values cannot be

changed after the file system is created. If you decrease the values from the default of 3, ensure that it

is really what you want. Also, setting Max Data Replicas, Max Metadata Replicas, Default Data

Replicas, and Default Metadata Replicas to three implies that at least three failure groups in your

clusters (at least three nodes with disks), or the file system creation will fail.

Ambari can select mounted paths when local paths must be used. When a shared file system is

selected by Ambari, some services will not function properly. The following directories can be affected

by this shared file system issue. Therefore, verify that the configuration directories are in a local

filesystem. They will all use the same file system, so if one directory is good, they should all be good.

Check the following parameters:

HBase YARN > Advanced (Tab) >hbase-site: Ensure that the HBase local directory is not a shared

mount point. They should point to a node local directory such as /hadoop/hbase.

YARN > Advanced (Tab) > Node Manager: Ensure that yarn.nodemanager.log-dirs and

yarn.nodemanager.local-dirs is not a shared mount point. They must point to a node local

directory such as /hadoop/yarn/local.

Page 42: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

42/86

YARN > Advanced (Tab) > Application Timeline Server: Ensure that yarn.timeline-service.leveldb-

timeline-store.path does not use a shared mount point.

use /hadoop/yarn/timeline

use /hadoop/yarn/log

use /hadoop/yarn/local

Page 43: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

43/86

Oozie > Oozie Server : Ensure that Oozie.Data.Dir does not use a shared mount point.

Zookeeper > Oozie Server: Ensure that the Zookeeper directory does not use a shared mount point.

Kafka Broker : Ensure that logs.dirs is a local path.

use /hadoop/zookeeper

use /hadoop/oozie/data

use /hadoop/kafka-logs

Page 44: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

44/86

7.4 Starting deployment

Review After verifying all the services on the Review page, click Deploy to begin the installation.

FIGURE 14 AMBARI DEPLOYMENT REVIEW

7.5 IBM Spectrum Scale deployment modes

The IBM Spectrum Scale state has three different modes. Follow the steps that pertain to your file system

setup requirements.

Modes:

IOP over existing IBM Spectrum Scale file system (FPO)

IOP over existing IBM Spectrum Scale file system (ESS)

IOP over new IBM Spectrum Scale cluster (FPO support only)

Deploy IOP over existing IBM Spectrum Scale file system (FPO)

If you take this mode, you need to start the IBM Spectrum Scale cluster by using

mmstartup -a

Page 45: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

45/86

in the console of any one node in the IBM Spectrum Scale cluster.

mount the file system over all nodes by using

mmmount<fs-name> -a.

Ensure that the IBM Spectrum Scale NSD stanza file, gpfs_nsd, does not exist under /var/lib/ambari-

server/resources/ on the Ambari server node.

If you haven’t started the IBM Spectrum Scale cluster yet but are at the Ambari assign slaves and clients panel,

Figure 9 ambari assign slaves and clients, click the previous button to go back to Assign Master panel in Ambari.

Start the IBM Spectrum Scale cluster and mount the file system onto all the nodes. Go back to the Ambari gui to

continue on to the assign slaves and client panel.

Ambari detects the mounted file system and reflects it in Custom Service page for IBM Spectrum Scale.

Deploy IOP over existing IBM Spectrum Scale file system (ESS)

If you take this mode, start ESS and set up passwordless ssh login from the Ambari server at one of the IBM

Spectrum Scale nodes which is in the ESS IBM Spectrum Scale cluster. One configuration file must be created

named /var/lib/ambari-server/resources/shared_gpfs_node.cfg on the Ambari server. This file must contain

the hostname of a node which is part of the ESS cluster.

Ensure that the IBM Spectrum Scale NSD stanza file, gpfs_nsd, does not exist under /var/lib/ambari-

server/resources/ on the Ambari server node.

If you haven’t started the IBM Spectrum Scale cluster yet but are at the Ambari assign slaves and clients panel,

Figure 9 ambari assign slaves and clients, click the previous button to go back to Assign Master panel in Ambari.

Start the IBM Spectrum Scale cluster and mount the file system onto all the nodes. Go back to the Ambari gui to

continue on to the assign slaves and client panel.

Ambari automatically detects the mounted file system and reflects it in “Custom Service” page for IBM Spec-

trum Scale.

In this mode, Ambari can create local cache disk for Hadoop usage. Create the following file:

[root@compute000 GPFS]# cat /var/lib/ambari-server/resources/hadoop_disk DISK|compute001.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev/sdm,/dev/sdn,/dev/sdo,/dev/sdp DISK|compute002.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev/sdm,/dev/sdn,/dev/sdo,/dev/sdp DISK|compute003.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev/sdm,/dev/sdn,/dev/sdo,/dev/sdp DISK|compute005.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev/sdm,/dev/sdn,/dev/sdo,/dev/sdp DISK|compute006.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev

Page 46: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

46/86

/sdm,/dev/sdn,/dev/sdo,/dev/sdp

Add the file name on the Custom Service page, Hadoop local cache disk stanza file.

FIGURE 15AMBARI IBM SPECTRUM SCALE HADOOP LOCAL CACHE FILE STANZA

Note: If you are not using shared storage, you do not need this configuration and you can leave this parameter

unchanged in the Ambari GUI.

Additional steps for deploying IOP over existing IBM Spectrum Scale file system – FPO or ESS

For a pre-created IBM Spectrum Scale cluster, please review this section carefully.

IBM Spectrum Scale NSD stanza file is not required because the filesystem already exists. Because

Ambari does not allow blank value, leave the default value of IBM Spectrum Scale NSD stanza file.

Deploy IOP over new IBM Spectrum Scale file system (FPO support only)

To deploy onto a new IBM Spectrum Scale FPO cluster, follow the following setup points:

Prepare a IBM Spectrum Scale NSD stanza file

Two types of NSD files are supported for file system auto creation. One is the preferred simple format, and another is the standard IBM Spectrum Scale NSD file format for IBM Spectrum Scale experts.

Page 47: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

47/86

If a simple NSD file is used, Ambari selects the proper metadata and data ratio for you. If possible, Ambari creates partitions on some disks for Hadoop intermediate data, which improves the Hadoop performance.

If the standard IBM Spectrum Scale NSD file is used, administrators are responsible for storage space arrangement. One policy file is also required when the standard IBM Spectrum Scale NSD file is used.

Partition algorithm

Algorithm for system pool and usage.

Failure group selection rule

Failure groups are created dependent on the rack location of the node.

Rack Mapping File

Nodes can be defined to belong to racks.

Partitioning the function matrix

The reason why one disk is divided into two partitions is because one partition is used for ext3/ext4 to store the map/reduce intermediate data, while another partition is used as a data disk in the IBM Spec-trum Scale file system. Also, only data disks can be partitioned, metadata disks cannot.

For more information on each of the setup points, refer to the Appendix Preparing a stanza File and IBM Spec-trum Scale-FPO Deployment sections.

7.6 Verify and Test Installation

For an initial installation through Ambari, the UID/GID of these users will be consistent over all nodes.

However, if you deploy it for the second time or part of nodes have been created with some UID/GID

above, the UID/GID of these users might not be consistent over all nodes, per the AMBARI-10186 issue,

from the Ambari community.

After deployment and during verification of system, check by using

mmdsh -N all id <user-name>

to see whether the UID is consistent across all nodes.

Page 48: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

48/86

After the Ambari deployment, check the IBM Spectrum Scale installed packages on all nodes by using

rpm -qa | grep gpfs to verified that all base IBM Spectrum Scale packages have been installed.

Run wordcount to test the installation

For example:

As user fvt255 create the hadoop filesystem of the user if it does not exist.

[root@ ~]# hadoop fs -mkdir /user/fvt255 [root@ ~]# hadoop fs -chown -R fvt255:users /user/fvt255

Copy the mywordcountfile file to be used as input to the wordcount program. Run the wordcount

program

[fvt255@ ~]$ yarn jar /usr/iop/4.1.0.0/hadoop-mapreduce/hadoop-mapreduce-examples-2.7.1-IBM-8.jar wordcount /user/fvt255/wc_input/mywordcountfile /user/fvt255/wc_output [fvt255@ ~]$ hadoop fs -ls wc_output Found 2 items -rw-r--r-- 3 fvt255 users 0 2016-02-08 16:48 wc_output/_SUCCESS -rw-r--r-- 3 fvt255 users 14219 2016-02-08 16:48 wc_output/part-r-00000

Check the Hadoop GUI

http://<Hadoop_Server>:8042/node

Page 49: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

49/86

Appendix

A. Preparing a stanza File

The Ambari install process can install and configure a new IBM Spectrum Scale cluster file system and configure it for

Hadoop workloads according to best practices. To support this task, the installer must know about the disks availa-

ble in the cluster and how you want to use them. If you do not indicate preferences, intelligent defaults are used.

Note: Ambari for deploying a new IBM Spectrum Scale cluster is only supported for FPO.

Two types of NSD files are supported for file system auto-creation. One is the preferred simple format, and another

is the standard IBM Spectrum Scale NSD format intended for experienced IBM Spectrum Scale administrators.

Preferred Simple Format Standard Format

o Ambari selects the proper metadata and data ratios.

o If possible, Ambari creates partitions on some disks for Hadoop intermediate data to enhance performance

o One system pool and one data pool are creat-ed

o NSD file must be located at /var/lib/ambari-server/resources/ on the Ambari server

o Only /dev/sdX and /dev/dx-X devices are sup-ported

o The GPFS administrator is responsible for the storage arrangement and configuration.

o A policy file is also required

o Storage pools and block sizes can be defined as needed.

Example of a Preferred Simple IBM Spectrum Scale NSD File

This tells IBM Spectrum Scale set-up process that there are 7 nodes, each with 6 disk drives to be defined as NSDs.

All information must be continuous with no extra spaces.

# cat /var/lib/ambari-server/resources/gpfs_nsd DISK|compute001.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg DISK|compute002.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg DISK|compute003.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg DISK|compute005.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg DISK|compute006.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg DISK|compute007.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg

Page 50: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

50/86

If you want to select specific disks for metadata, such as SSD drives, add the label -meta to those disks.

For a simple NSD file, add the label meta for the disks that you want to be metadata disks, as shown in the fol-lowing example. If -meta is used, the Partition algorithm is ignored.

# cat /var/lib/ambari-server/resources/gpfs_nsd DISK|compute001.private.dns.zone:/dev/sdb-meta,/dev/sdc,/dev/sdd DISK|compute002.private.dns.zone:/dev/sdb-meta,/dev/sdc,/dev/sdd DISK|compute003.private.dns.zone:/dev/sdb-meta,/dev/sdc,/dev/sdd DISK|compute005.private.dns.zone:/dev/sdb-meta,/dev/sdc,/dev/sdd DISK|compute006.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd DISK|compute007.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd

In the simple NSD file, above, /dev/sdb from compute001, compute002, compute003 and compute005 are specified as meta disks in the IBM Spectrum Scale file system. The partition algorithm is ignored if the nodes listed in the simple NSD file do not match the set of nodes that will be used for the NodeManager service. If some nodes that are not NodeManagers are in the NSD file or some nodes that will be NodeManagers are not in the NSD file, no partitioning will be done. Example of a Standard IBM Spectrum Scale NSD File

%pool: pool=system blockSize=256K layoutMap=cluster allowWriteAffinity=no %pool: pool=datapool blockSize=2M layoutMap=cluster allowWriteAffinity=yes writeAffinityDepth=1 blockGroupFac-tor=256 # gpfstest9 %nsd: nsd=node9_meta_sdb device=/dev/sdb servers=gpfstest9 usage=metadataOnly failureGroup=101 pool=system %nsd: nsd=node9_meta_sdc device=/dev/sdc servers=gpfstest9 usage=metadataOnly failureGroup=101 pool=system %nsd: nsd=node9_data_sde2 device=/dev/sde2 servers=gpfstest9 usage=dataOnly failureGroup=1,0,1 pool=datapool %nsd: nsd=node9_data_sdf2 device=/dev/sdf2 servers=gpfstest9 usage=dataOnly failureGroup=1,0,1 pool=datapool # gpfstest10 %nsd: nsd=node10_meta_sdb device=/dev/sdb servers=gpfstest10 usage=metadataOnly failureGroup=201 pool=system %nsd: nsd=node10_meta_sdc device=/dev/sdc servers=gpfstest10 usage=metadataOnly failureGroup=201 pool=system %nsd: nsd=node10_data_sde2 device=/dev/sde2 servers=gpfstest10 usage=dataOnly failureGroup=2,0,1 pool=datapool %nsd: nsd=node10_data_sdf2 device=/dev/sdf2 servers=gpfstest10 usage=dataOnly failureGroup=2,0,1 pool=datapool # gpfstest11 %nsd: nsd=node11_meta_sdb device=/dev/sdb servers=gpfstest11 usage=metadataOnly failureGroup=301 pool=system %nsd: nsd=node11_meta_sdc device=/dev/sdc servers=gpfstest11 usage=metadataOnly failureGroup=301 pool=system %nsd: nsd=node11_data_sde2 device=/dev/sde2 servers=gpfstest11 usage=dataOnly failureGroup=3,0,1 pool=datapool %nsd: nsd=node11_data_sdf2 device=/dev/sdf2 servers=gpfstest11 usage=dataOnly failureGroup=3,0,1 pool=datapool

Type the /var/lib/ambari-server/resources/gpfs_nsd filename into the NSD stanza field. If you are using standard NSD stanza file, then a policy file is required.

Page 51: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

51/86

Policy File E.g. bigpfs.pol RULE 'default' SET POOL 'datapool'

Because of the limitations of the Ambari framework, the NSD file must be copied to the Ambari server under the /var/lib/ambari-server/resources/ directory. Make sure the correct file name is specified on the IBM Spec-trum Scale Customize Services page.

FIGURE 16 AMBARI NSD STANZA

B. IBM Spectrum Scale-FPO Deployment

Disk-partitioning algorithm If a simple NSD file is used and without the -meta label, Ambari assigns metadata and data disks and partitions

the disk following the rules below:

1. If nodes number are less than four:

a. If the disk number of each node is less than three, put all disks to system pool and usage = metada-

taanddata. No partitioning is done.

b. If the disk number of each node is greater than four, assign metaonly and dataonly disks based on ratio

1:3 on each node. But the MAX metadisk number/per node is four. Partitioning is done provided that

all NodeManager nodes are also NSD nodes and have the same number of NSD disks.

2. If node number is greater than 5:

a. If the disk number of each node is less than 2 --- put all disks to system pool and usage is metadata and

data. Partitioning is not done.

b. Set 4 nodes to metanodes where meta disks are located. Others are datanodes.

Page 52: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

52/86

c. Failure groups are created based on the failure group selection rule

d. Assign meta disk and data disks on meta node. Assign only data disk on data node. The ratio follows

best practice and between 1:3 and 1:10.

e. If all node manager nodes have the same number of NSD disks, create local partition on data disks for

Hadoop intermediate data.

Failure Group selection rules

Failure groups are created based on rack allocation of the nodes. One rack mapping file is supported (Rack Mapping

File). Ambari reads this file and assigns one failure group per rack. The rack number must be three or greater. If rack

mapping file is not provided, virtual racks are created for data fault toleration.

1. If the node number is less than four, each node is on a different rack.

2. If the node number is greater than five and node number is greater than 10, every two nodes are put in

one virtual rack.

3. If the node number is greater than ten and node number is less than 21, every three nodes are put in one

virtual rack.

4. If the node number is less than 22, every 10 nodes are put in one virtual rack.

Rack Mapping File

Nodes can be defined to belong to racks. For three or more racks, the failure groups of each NSD will corre-spond to the rack the node is in. A sample file is available on the Ambari server at /var/lib/ambariserver/resources/.

# cat /var/lib/ambari-server/resources/racks.sample #Host/Rack map configuration file #Format: #[hostname]:/[rackname] #Example: #mn01:/rack1 #NOTE: #The first character in rack name must be "/" mn03:/rack1 mn04:/rack2 dn02:/rack3

Page 53: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

53/86

FIGURE 17 AMBARI RACK MAPPING

Partitioning Function Matrix in Automatic Deployment

Each data disk is divided into two partitions because one partition will be used for an ext4 file system to store the map or reduce intermediate data, while another partition will be used as a data disk in the IBM Spectrum Scale file system. Only data disks can be partitioned. Meta disks cannot be partitioned. On the other hand, if a node is not selected as a NodeManager for YARN, then there will not be any map or reduce tasks running on that node. In this case, partitioning the disks of the node makes no sense because the local partition will not be used.

Page 54: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

54/86

The following table describes the partitioning function matrix:

TABLE 5IBM SPECTRUM SCALE PARTITIONING FUNCTION MATRIX

Specify the standard

NSD file

Specify the simple NSD file

without the -meta label

Specify the simple NSD file

with the -meta label

#1:

<node manager host list> ==

< IBM Spectrum Scale NSD

server nodes>

The node manager host list is

equal to IBM Spectrum Scale

NSD server nodes.

No partitioning;

Create an NSD directly

with the NSD file.

Partition and select meta

disks for the customer

according to Disk-

partitioning algorithm

and Failure Group selec-

tion rules

No partitioning.

All disks marked with the -

meta label will be used for

metadata NSD disks. All

others are marked as data

NSDs.

#2:

<node manager host list>><

IBM Spectrum Scale NSD

server nodes>

Some node manager hosts

are not in IBM Spectrum Scale

NSD server nodes but all IBM

Spectrum Scale NSD server

nodes are in node manager

host list.

No partitioning.

Create the NSD direct-

ly with the specified

NSD file.

No partitioning, but select

meta disks for the cus-

tomer according to Disk-

partitioning algorithm

and Failure Group selec-

tion rules

No partitioning.

All disks marked with the -

meta label are used for

metadata NSD disks. All

others are marked as data

NSDs.

<node manager host list><<

IBM Spectrum Scale NSD

server nodes>

Some IBM Spectrum Scale

NSD server nodes are not in

the node manager host list

but all node manager host

lists are in IBM Spectrum

Scale NSD server nodes.

No partitioning.

Create the NSD direct-

ly with the specified

NSD file.

No partitioning, but select

meta disks for customer

according to Disk-

partitioning algorithm

and Failure Group selec-

tion rules

No partitioning.

All disks marked with the -

meta label will be used for

metadata NSD disks. All

others are marked as data

NSDs.

As for standard NSD files or simple NSD files with the -meta label. The IBM Spectrum Scale NSD and filesystem are created directly.

Page 55: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

55/86

To specify which disks are used for metadata and have data disks partitioned, use the script parti-tion_disks_general.sh, found in Attachments at the bottom of the References: BigInsight Enterprise Manager

wiki, to partition the disks first, and specify the partition that is used for GPFS NSD in a simple NSD file. For example:

[root@compute000 GPFS]# cat /var/lib/ambari-server/resources/gpfs_nsd DISK|compute001.private.dns.zone:/dev/sdb-meta,/dev/sdc2,/dev/sdd2 DISK|compute002.private.dns.zone:/dev/sdb-meta,/dev/sdc2,/dev/sdd2 DISK|compute003.private.dns.zone:/dev/sdb-meta,/dev/sdc2,/dev/sdd2 DISK|compute005.private.dns.zone:/dev/sdb-meta,/dev/sdc2,/dev/sdd2 DISK|compute006.private.dns.zone:/dev/sdb,/dev/sdc2,/dev/sdd2 DISK|compute007.private.dns.zone:/dev/sdb,/dev/sdc2,/dev/sdd2

After deployment is done by this mode, manually update the yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs to contain the directory list from the disk partitions that are used for map/reduce intermediate data.

C. Dual-network deployment

In your cluster, if each node has two network adapters, it is called a dual network cluster. You can assign one network for Ambari services, such as Yarn and Hbase, and another network for IBM Spectrum Scale data trans-fer. Configuring the network setting can improve network bandwidth for both IBM Spectrum Scale and Hadoop services, such as Yarn, because both services consume a lot of network bandwidth when there are data IO op-erations running on the Hadoop cluster.

If the two network adapters are 1Gb network and 10Gb network, route all services over the 10Gb network be-cause Yarn-like workloads need a lot of network bandwidth. If these services are to be routed over the 1Gb network, the map and reduce job performance is impacted.

Ambari does not support dual networks if one network adapter is used for IBM Spectrum Scale and the second network adapter is used for Hadoop services because Ambari uses the same host list for both services. Manual deployment is required to create this dual network cluster environment. For more information, see References: Deploy BigInsights IOP 4.1 over IBM Spectrum Scale.

Two network adapters, configured with different sub-network addresses

If each node in the cluster has two network adapters, where one is configured with one sub-network address , 192.168.1.x/24, and the other is configured with another sub-network address 192.168.2.x/24, the following decisions must be made:

The sub-network address that are to be used for all Ambari services assuming that the sub-network address is 192.168.1.x)

The sub-network address to be used for IBM Spectrum Scale node-to-node data transfer assuming that the sub-network address is 192.168.2.x.

Page 56: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

56/86

When deploying IOP + IBM Spectrum Scale through Ambari, specify the IP addresses or corresponding host name list for 192.168.1.x. After deployment has been completed through Ambari, perform the following steps:

1. Stop all services on the Ambari GUI by selecting Actions from the left panel

2. SSH to any one of the IBM Spectrum Scale node and run mmchconfig subnets=192.168.2.0 -N all

3. Start all services on Ambari.

After completing the preceding steps, run the system monitor tool nmon which can be downloaded from http://sourceforge.net/projects/nmon/ to ensure that there is obvious network traffic over the 192.168.2.x network adapter when you write data into IBM Spectrum Scale.

Two network adapters, configured with same sub-network addresses

If each node in the cluster has two network adapters, where both network adapters are configured with the same sub-network addresses, 192.168.1.10 for one adapter and 192.168.1.11 for the other, the following deci-sions must be made:

The network address to be used for all the Ambari services assuming it is 192.168.1.<address1> The network address to be used for IBM Spectrum Scale node-to-node data transfer assuming it is

192.168.1.<address2>.

While deploying IOP + IBM Spectrum Scale through Ambari, specify the IP addresses or corresponding host name list for the IP address 192.168.1.<address1>. After deployment has been completed through Ambari , perform the following steps:

1. Stop all services on the Ambari GUI (From left panel, at bottom, select “Actions”) 2. SSH to any one of the IBM Spectrum Scale node and run the following commands:

mmchcluster --ccr-disable mmchnode --daemon-interface=hostnameY -N hostnameX <== do this for all nodes mmchcluster –ccr-enable Note:

hostnameX must be changed to the real host name that represents 192.168.1.<address1> of the node that is to be changed.

hostnameY must be changed to the real host name that represents 192.168.1.<address2> of the

node that is to be changed.

For example, you could get hostnameX from the output of /usr/lpp/mmfs/bin/mmlscluster (the value that is displayed in the “Admin node name” column):

Node Daemon node name IP address Admin node name Designation

--------------------------------------------------------------------------------

1 gpfstest10.cn.ibm.com 192.168.10.10 gpfstest10.cn.ibm.com quorum

2 gpfstest11.cn.ibm.com 192.168.10.11 gpfstest11.cn.ibm.com

3 gpfstest12.cn.ibm.com 192.168.10.12 gpfstest12.cn.ibm.com

Page 57: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

57/86

Run mmchnode –daemon-interface=gpfstest10g.cn.ibm.com -N gpfstest10.cn.ibm.com

to change the daemon interface from the IP address of gpfstest10.cn.ibm.com into gpf-stest10g.cn.ibm.com.Here, gpfstest10g.cn.ibm.com is used to replace hostnameY.

Do this for all nodes accordingly.

3. Start all services on Ambari.

After completing the preceding steps, run the system monitor tool nmon. The nmon tool can be downloaded from http://sourceforge.net/projects/nmon/. This ensures that there is obvious network traffic over the 192.168.2.x network adapter when you write data into IBM Spectrum Scale.

D. BigInsights Value Add Services on IBM Spectrum Scale

For IBM Spectrum Scale, some minor adjustments are required to the standard BigInsights Value Add installation

instructions.

1. Perform the preparation Steps for BigInsights Value Adds:

https://ibm.biz/BdHbiz

2. After Installing the BigInsights Analyst (BI-Analyst-IOP-x-x-x.rpm) and Data Scientist (BI-DS-IOP-x-x-x.rpm)

RPMs, the ambari-server resource stacks are updated in

/var/lib/ambari-server/resources/stacks/BigInsights/4.1/repos/repoinfo.xml

Modify the file to reference your local repo for Big Insights value adds.

For example:

<repo> <baseurl> http://<YUM-Server>/repos/valueadds </baseurl> <repoid>BIGINSIGHTS-VALUEPACK.4.1.0.2</repoid> <reponame>BIGINSIGHTS-VALUEPACK.4.1.0.2</reponame> </repo>

3. Look for the same file in the IBM Spectrum Scale stack subdirectory and add the same repo definition as in the

prior step:

/var/lib/ambari-server/resources/stacks/BigInsights/4.1.SpectrumScale/repos/repoinfo.xml

Page 58: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

58/86

Insert the same local repository information in the correct OS version section.

4. Reset YUM cached packages and headers and restart the Ambari server

yum clean all; yum makecache

ambari-server restart

Troubleshooting Value Add Services

Why can’t BigSheets see files in IBM Spectrum Scale?

Perform the following changes and restart BigSheets.

Step 1: Ensure that GPFS connector is version hadoop-gpfs-2.7.0-5 or later.

To determine current version installed, use:

rpm –qa | grep connector

If the connector installed does not meet this requirement, see additional resources below to acquire

the latest available connector.

Step 2: Post install, BigSheets application must link in IBM Spectrum Scale Hadoop Jar file.

Perform the following on the BigSheets master node:

cd /usr/ibmpacks/bigsheets/<version>/jetty/lib/ext/

ln -s /usr/iop/current/hadoop-client/hadoop-gpfs.jar

Step 3: If you are using Knox Demo LDAP, the user ID used to authenticate with Knox must also be

created at the operating system all nodes.

Knox demo LDAP is for demonstration purposes only and does not integrate with the operating system

of Hadoop nodes. Therefore, users created in Knox demo LDAP do not exist outside of Knox. If both

Knox and the operating systems of all Hadoop nodes are configured for the same LDAP server, this

step is not required.

Issues with Text Analytics

Problem: Text Analytics cannot see files id IBM Spectrum Scale

Solution: Text Analytics application needs to load hadoop-gpfs.jar. Perform the following change and

restart Text Analytics

Page 59: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

59/86

In the file:

/usr/ibmpacks/text-analytics-web-tooling/<version>/jetty-distribution-<version>/contexts/ text-

analytics-web-tooling.xml

Add text in bold:

<Set name="extraClasspath">/usr/iop/current/hadoop-client/conf,/usr/iop/current/hadoop-

client/hadoop-gpfs.jar,………</Set>

Problem: Text Analytics Run on Cluster failed with error. The specified extractors could not be execut-ed due to an unexpected error. Verify that you have read and write access to the directories you have selected, and that the file names you have entered are valid.

Solution: Check the logs located under:

/usr/ibmpacks/text-analytics-web-tooling/3.8/jetty-distribution-8.1.17.v20150415/logs

If you see exceptions like:

“java.io.FileNotFoundException: GPFSC00007E: File does not exist: /user/tauser/lib”

You must manually create the directory by taking the following steps:

Step 1: Log in to the Text Analytics Node Step 2: su – tauser Step 3: hadoop dfs -mkdir /user/tauser/lib Step 4: Copy all jar packages under:

/usr/ibmpacks/text-analytics-runtime/4.10/lib

and

/usr/ibmpacks/current/text-analytics-runtime/action-api/lib/

and the sub- directories

into

/<gpfs-mount-point>/user/tauser/lib/

Problem: Text Analytics Run on Clusterfails with error:

The specified extractors are not executed because a valid cluster configuration could not be found.

Contact the system administrator to verify that the cluster is running correctly and that the cluster

Page 60: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

60/86

configuration is available on your server. If you indicated that the execution artifacts must be gener-

ated, those artifacts must still be available at the specified location, even though the extractors were

not executed.

Solution: This is a known issue for BigInsights V4.1.0.1 and specific to IBM Spectrum Scale support.

The later versions might not have this problem. If you encounter this error, please contact IBM sup-

port and reference defect 97360.

Issues with Big R

1. Big R for GPFS requires HTTPFS to be configured as a workaround for WebHDFS. Verify the HTTPFS

configuration according to Appendix A.

2. On all nodes, make hadoop-gpfs.jar to Big R:

vi /usr/ibmpacks/bigr/<version>/bigr-Jaql/<version>/conf/jaql.xml Add the hadoop-gpfs.jar to the list of jars in the jaql.job.jars name/value section. For example: … <name>jaql.job.jars</name> <value>jaql.jar,commons-lang-2.5.jar, icu4j.jar,….. ,hadoop-gpfs.jar</value> …

3. On all nodes, create a symbolic link to hadoop-gpfs.jar on two places: cd /usr/ibmpacks/bigr/<version>/bigr-Jaql/<version>/lib ln -s /usr/iop/current/hadoop-client/hadoop-gpfs.jar cd /usr/ibmpacks/bigr/<version>/bigr-bigsql1/<version>/lib/ext ln -s /usr/iop/current/hadoop-client/hadoop-gpfs.jar

4. Validate the permissions on <gpfs_mount>/tmp/bigr: Ensure that <gpfs_mount>/tmp/bigr is writable for all users chmod 777 <gpfs_mount>/tmp/bigr

5. Restart Big R and run Big R service check

E. Node management

Add Node

New nodes can be added via the Ambari Web GUI.

1. Access the Hosts tab in the Ambari GUI (Illustration 5.a), and click Add New Hosts from the Ac-

tions button.

Page 61: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

61/86

2. Specify the new node information, and then click Registration and Confirm.

Note: In illustration 5.b, the SSH Private Key is the key of the user on the Ambari Server.

3. Select the services that you need to install on to this new node.

Page 62: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

62/86

4. If several configuration groups were created, select one of them for the new node.

5. Start the deployment by clicking Deploy

Page 63: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

63/86

6. Restart the IBM Spectrum Scale Master service on the IBM Spectrum Scale master node. This action up-

dates the pagepool configuration on all the new nodes. If several nodes are added, restart once, after all

nodes are deployed.

Page 64: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

64/86

7. Ambari does not create NSDs on the new nodes. To create IBM Spectrum Scale NSDs and add NSDs to the

filesystem, follow the instructions at References: Deploy BigInsights IOP 4.1 over IBM Spectrum Scale.

Remove Node

Note:

Before removing one node, check that the following conditions are met:

1. The rest of the IBM Spectrum Scale file system free space is enough for all data.

2. The number of quorum nodes is enough for keeping the IBM Spectrum Scale cluster up.

3. The number of failure groups is greater than the number of data replicas.

Automatically removing an IBM Spectrum Scale node is not supported in this release. To remove one IBM Spec-

trum Scale node, some manual steps are required.

This example shows how to remove node compute002.

1. Click the node name on the Hosts tab in Figure 18 ambari hosts panel

FIGURE 18 AMBARI HOSTS PANEL

2. Stop all services on this node except on GPFS Node in Figure 19 ambari hosts gpfs node components

Page 65: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

65/86

FIGURE 19 AMBARI HOSTS GPFS NODE COMPONENTS

3. Log on to the node terminal through SSH. Then, remove NSD from the IBM Spectrum Scale file system.

If this node is not an NSD node, skip this step.

[root@compute002 ~]# cat nsd.stanza %pool:pool=system layoutMap=cluster blocksize=256K %pool:pool=datapool layoutMap=cluster blocksize=2048K allowWriteAffinity=yes writeAffinityDepth=1 blockGroupFactor=128 %nsd: nsd=gpfs33nsd device=/dev/sdd servers=compute002 usage=dataOnly failureGroup=2,0,3 pool=datapool %nsd: nsd=gpfs34nsd device=/dev/sde servers=compute002 usage=dataOnly failureGroup=2,0,3 pool=datapool %nsd: nsd=gpfs35nsd device=/dev/sdf servers=compute002 usage=dataOnly failureGroup=2,0,3 pool=datapool [root@compute002 ~]# /usr/lpp/mmfs/bin/mmdeldisk bigpfs -F nsd.stanza Deleting disks ... Scanning file system metadata, phase 1 ... Scan completed successfully. Scanning file system metadata, phase 2 ... Scanning file system metadata for datapool storage pool Scan completed successfully. Scanning file system metadata, phase 3 ... Scan completed successfully. Scanning file system metadata, phase 4 ... Scan completed successfully. Scanning user file metadata ... 55.31 % complete on Wed Aug 19 04:00:34 2015 ( 500352 inodes with total 10525 MB data processed) 100.00 % complete on Wed Aug 19 04:00:37 2015 ( 500352 inodes with total 13585 MB data processed) Scan completed successfully. Checking Allocation Map for storage pool system Checking Allocation Map for storage pool datapool tsdeldisk completed. mmdeldisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@compute002 ~]# /usr/lpp/mmfs/bin/mmdelnsd -F nsd.stanza mmdelnsd: Processing disk gpfs33nsd mmdelnsd: Processing disk gpfs34nsd mmdelnsd: Processing disk gpfs35nsd mmdelnsd: Propagating the cluster configuration data to all

Page 66: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

66/86

affected nodes. This is an asynchronous process.

4. Stop the IBM Spectrum Scale Node service from the Ambari GUI in Figure 20 ambari hosts actions

FIGURE 20 AMBARI HOSTS ACTIONS

5. Log on to the IBM Spectrum Scale master node. Then, remove the node from the IBM Spectrum Scale

cluster.

[root@compute002 ~]# /usr/lpp/mmfs/bin/mmhadoopctl connector stop Hadoop connector 'gpfs-connector-daemon' stopped. [root@compute002 ~]# ssh compute001 Last login: Thu Aug 20 05:23:43 2015 from compute000 [root@compute001 ~]# /usr/lpp/mmfs/bin/mmhadoopctl connector detach --distribution BigInsights -N compute002 DISTRIBUTION=biginsights VERSION=4.1 ARCH=Linux-amd64-64 CONNECTOR_DIR=/usr/lpp/mmfs/hadoop SRC_JAR=/usr/lpp/mmfs/hadoop/hadoop-gpfs-2.7.0.jar JAR_FILE=hadoop-gpfs.jar JAR_DIR=/usr/iop/current/hadoop-client OOZIE_SERVER_DIR=/usr/iop/current/oozie-server SOLR_SERVER_DIR=/usr/iop/current/solr-server/server SLIDER_JAR_DIR=/usr/iop/current/slider-client/lib/ From compute002: Remove connector rm -f /usr/iop/current/hadoop-client/hadoop-gpfs.jar succeeded. rm -f /usr/iop/current/oozie-server/libext/hadoop-gpfs.jar succeeded. rm -f /usr/iop/current/slider-client/lib//hadoop-gpfs.jar succeeded.

Page 67: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

67/86

[root@compute001 ~]# /usr/lpp/mmfs/bin/mmdelnode compute002 Verifying GPFS is stopped on all affected nodes ... mmdelnode: Command successfully completed mmdelnode: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@compute001 ~]# ssh compute002 Last login: Wed Aug 19 04:09:41 2015 from compute001 [root@compute002 ~]# yum erase gpfs.* Loaded plugins: product-id, security, subscription-manager This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register. Setting up Remove Process Resolving Dependencies --> Running transaction check ---> Package gpfs.base.x86_64 0:4.1.1-1 will be erased ---> Package gpfs.crypto.x86_64 0:4.1.1-1 will be erased ---> Package gpfs.docs.noarch 0:4.1.1-1 will be erased ---> Package gpfs.ext.x86_64 0:4.1.1-1 will be erased ---> Package gpfs.gpl.noarch 0:4.1.1-1 will be erased ---> Package gpfs.gskit.x86_64 0:8.0.50-40 will be erased ---> Package gpfs.hadoop-connector.x86_64 0:2.7.0-1 will be erased ---> Package gpfs.msg.en_US.noarch 0:4.1.1-1 will be erased --> Finished Dependency Resolution Dependencies Resolved ============================================================================================================================================================= Package Arch Version Repository Size ============================================================================================================================================================= Removing: gpfs.base x86_64 4.1.1-1 @GPFS-4.1.1 42 M gpfs.crypto x86_64 4.1.1-1 @GPFS-4.1.1 403 k gpfs.docs noarch 4.1.1-1 @GPFS-4.1.1 1.5 M gpfs.ext x86_64 4.1.1-1 @GPFS-4.1.1 6.6 M gpfs.gpl noarch 4.1.1-1 @GPFS-4.1.1 2.5 M gpfs.gskit x86_64 8.0.50-40 @GPFS-4.1.1 28 M gpfs.hadoop-connector x86_64 2.7.0-1 @GPFS-4.1.1 959 k gpfs.msg.en_US noarch 4.1.1-1 @GPFS-4.1.1 601 k Transaction Summary ============================================================================================================================================================= Remove 8 Package(s) Installed size: 83 M Is this ok [y/N]: y Downloading Packages: Running rpm_check_debug Running Transaction Test Transaction Test Succeeded Running Transaction Erasing : gpfs.crypto-4.1.1-1.x86_64 1/8 Erasing : gpfs.gpl-4.1.1-1.noarch 2/8 make[1]: Entering directory `/usr/lpp/mmfs/src' …. rm -f -rf usr make[2]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' make[1]: Leaving directory `/usr/lpp/mmfs/src'

Page 68: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

68/86

Erasing : gpfs.ext-4.1.1-1.x86_64 3/8 Erasing : gpfs.hadoop-connector-2.7.0-1.x86_64 4/8 Erasing : gpfs.base-4.1.1-1.x86_64 5/8 Erasing : gpfs.msg.en_US-4.1.1-1.noarch 6/8 Erasing : gpfs.docs-4.1.1-1.noarch 7/8 Erasing : gpfs.gskit-8.0.50-40.x86_64 8/8 Verifying : gpfs.gpl-4.1.1-1.noarch 1/8 Verifying : gpfs.gskit-8.0.50-40.x86_64 2/8 Verifying : gpfs.crypto-4.1.1-1.x86_64 3/8 Verifying : gpfs.base-4.1.1-1.x86_64 4/8 Verifying : gpfs.ext-4.1.1-1.x86_64 5/8 Verifying : gpfs.hadoop-connector-2.7.0-1.x86_64 6/8 Verifying : gpfs.docs-4.1.1-1.noarch 7/8 Verifying : gpfs.msg.en_US-4.1.1-1.noarch 8/8 Removed: gpfs.base.x86_64 0:4.1.1-1 gpfs.crypto.x86_64 0:4.1.1-1 gpfs.docs.noarch 0:4.1.1-1 gpfs.ext.x86_64 0:4.1.1-1 gpfs.gpl.noarch 0:4.1.1-1 gpfs.gskit.x86_64 0:8.0.50-40 gpfs.hadoop-connector.x86_64 0:2.7.0-1 gpfs.msg.en_US.noarch 0:4.1.1-1 Complete!

6. Log on to the node and stop the ambari- agent.

[root@compute002 ~]# ambari-agent stop Verifying Python version compatibility... Using python /usr/bin/python2.6 Found ambari-agent PID: 13754 Stopping ambari-agent Removing PID file at /var/run/ambari-agent/ambari-agent.pid ambari-agent successfully stopped

7. Delete the host using the Ambari GUI in Figure 21 ambari hosts actions delete host

Notes: By removing this host, Ambari ignores future communications from this host. Software packages will not be removed from the host. The components on the host must not be restarted. If you wish to add this host to the cluster, clean it.

Page 69: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

69/86

FIGURE 21 AMBARI HOSTS ACTIONS DELETE HOST

8. Clean the software packages. This is an optional step.

[root@compute002 ~]# python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py --silent --skip=users -f /etc/ambari-agent/conf/HostCleanup.ini,/etc/ambari-agent/conf/HostCleanup_Custom_Actions.ini … WARNING:HostCleanup:No alternatives found for: flume-conf WARNING:HostCleanup:No alternatives found for: hadoop-conf WARNING:HostCleanup:No alternatives found for: hadoop-httpfs-conf WARNING:HostCleanup:No alternatives found for: hadoop-httpfs-tomcat-conf WARNING:HostCleanup:No alternatives found for: hbase-conf WARNING:HostCleanup:No alternatives found for: hive-webhcat-conf WARNING:HostCleanup:No alternatives found for: hive-conf WARNING:HostCleanup:No alternatives found for: hive-hcatalog-conf WARNING:HostCleanup:No alternatives found for: kafka-conf WARNING:HostCleanup:No alternatives found for: knox-conf WARNING:HostCleanup:No alternatives found for: oozie-conf WARNING:HostCleanup:No alternatives found for: oozie-tomcat-conf WARNING:HostCleanup:No alternatives found for: pig-conf WARNING:HostCleanup:No alternatives found for: spark-conf WARNING:HostCleanup:No alternatives found for: sqoop-conf WARNING:HostCleanup:No alternatives found for: zookeeper-conf INFO:HostCleanup:Clean-up completed. The output is at /var/lib/ambari-agent/data/hostcleanup.result

F. Upgrade IBM Spectrum Scale to Latest PTF

You can update the IBM Spectrum Scale and IBM Spectrum Scale Hadoop Connector PTF package through the Ambari server. The cross release upgrade is not supported. The IBM Spectrum Scale PTF and the IBM Spectrum Scale Hadoop connector can be upgraded separately.

1. Put all PTF RPM packages or the IBM Spectrum Scale Hadoop connector PTF RPM in the IBM Spectrum Scale YUM repository, which was created in chapter 2.3. In most of cases, the IBM Spectrum Scale PTF package includes following packages:

gpfs.base-4.1.1-x.ppc64le.update.rpm gpfs.crypto-4.1.x-1.ppc64le.update.rpm gpfs.docs-4.1.1-x.noarch.rpm gpfs.ext-4.1.1-x.ppc64le.update.rpm gpfs.gpl-4.1.1-x.noarch.rpm gpfs.gskit-8.0.50-40.ppc64le.rpm gpfs.msg.en_US-4.1.1-x.noarch.rpm

2. Go to the IBM Spectrum Scale Yum directory and rebuild the Yum database by using the command 'createrepo'

[root@smn GPFS]# createrepo . Spawning worker 0 with 8 pkgs Workers Finished Gathering worker results Saving Primary metadata

Page 70: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

70/86

Saving file lists metadata Saving other metadata Generating sqlite DBs Sqlite DBs complete ...

3. Log on to the Ambari web GUI and stop all services, including IBM Spectrum Scale.

4. Click IBM Spectrum Scale. Click Upgrade_SpectrumScale” in the Service Actions drop down list. If you need to upgrade IBM Spectrum Scale Hadoop connector, click Upgrade_Connector.

FIGURE 22 AMBARI UPGRADE IBM SPECTRUM SCALE

G. Upgrade the IBM Spectrum Scale Ambari integration module

If the Ambari cluster has been installed with the gpfs.ambari-iop_4.1-1 package, you can upgrade to this gpfs.ambari-iop_4.1-2 release to get more powerful features such as IBM Spectrum Scale upgrade, Filesystem monitor, and Filesystem debug info collection. To upgrade from gpfs.ambari-iop_4.1-1, perform the following steps:

1. Stop all services through the Ambari GUI.

Page 71: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

71/86

FIGURE 23 AMBARI DASHBOARD ACTIONS STOP ALL

2. Log on to the Ambari server node and delete the IBM Spectrum Scale service via the Ambari REST API.

[root@compute000 ~]# curl -u admin:admin -X DELETE -H "X-Requested-By: ambari" http://mn01-dat:8080/api/v1/clusters/[cluster_name]/services/GPFS

3. Copy the gpfs.ambari-iop_4.1-2 package to the Ambari server node.

4. Upgrade the gpfs.ambari package to 4.1-2.

If /var/lib/ambari-server/resources/stacks/BigInsights/4.1.SpectrumScale/repos/repoinfo.xml was up-dated, it will be kept. Ignore the warning.

[root@compute000 ~]# /gpfs.ambari-iop_4.1-2.noarch.bin -u –q Unpacking... Done Upgrading... Preparing... ################################# [100%] Updating / installing... 1:gpfs.ambari-iop_4.1-2 warning: /var/lib/ambari-server/resources/stacks/BigInsights/4.1.SpectrumScale/repos/repoinfo.xml created as /var/lib/ambari-server/resources/stacks/BigInsights/4.1.SpectrumScale/repos/repoinfo.xml.rpmnew ################################# [ 50%] Cleaning up / removing... 2:gpfs.ambari-iop_4.1-1 ################################# [100%]

5. Restart the Ambari server.

Page 72: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

72/86

[root@compute000 ~]# ambari-server restart Using python /usr/bin/python2.7 Restarting ambari-server Using python /usr/bin/python2.7 Stopping ambari-server Ambari Server stopped Using python /usr/bin/python2.7 Starting ambari-server Ambari Server running with administrator privileges. Organizing resource files at /var/lib/ambari-server/resources... Server PID at: /var/run/ambari-server/ambari-server.pid Server out at: /var/log/ambari-server/ambari-server.out Server log at: /var/log/ambari-server/ambari-server.log Waiting for server start.................... Ambari Server 'start' completed successfully.

6. Start the IBM Spectrum Scale cluster and verify that the file system is mounted.

[root@compute000 ~]# /usr/lpp/mmfs/bin/mmstartup -a Wed Nov 25 21:05:09 EST 2015: mmstartup: Starting GPFS …

[root@compute000 ~]# /usr/lpp/mmfs/bin/mmlsmount all -L

File system bigpfs is mounted on 4 nodes: 172.168.10.11 compute001 172.168.10.12 compute002 172.168.10.13 compute003 172.168.10.14 compute004

7. Add the IBM Spectrum Scale service from the Ambari GUI.

a. Click '+ Add service' from the 'Action' drop down list on the Dashboard tab.

Page 73: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

73/86

FIGURE 24 AMBARI DASHBOARD ADD SERVICES

b. Select IBM Spectrum Scale on the Choose Services page.

Page 74: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

74/86

FIGURE 25 AMBARI UPGRADE CHOOSE SERVICES

c. Select the original IBM Spectrum Scale master node before upgrading.

FIGURE 26 AMBARI ADD SERVICE WIZARD

d. Assign all nodes to the IBM Spectrum Scale Hadoop connector and the IBM Spectrum Scale node.

FIGURE 27 AMBARI ASSIGN NODES - HADOOP CONNECTOR + IBM SPECTRUM SCALE NODE

Page 75: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

75/86

e. Verify that all the current IBM Spectrum Scale parameters are correct on the Customize Services page. If not, please check if IBM Spectrum Scale cluster is functioning.

FIGURE 28 AMBARI CUSTOMIZE SERVICES VERIFICATION

f. Review the summary and start the deployment. IBM Spectrum Scale is not actually deployed. It only adds IBM Spectrum Scale to the Ambari server.

FIGURE 29 AMBARI REVIEW PANEL

Page 76: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

76/86

g. Start other services.

After deployment, the IBM Spectrum Scale service is started automatically. Click Start All from the Action drop down list on the Dashboard tab.

FIGURE 30 AMBARI AFTER UPGRADE DASHBOARD

H. IBM Spectrum Scale UI

The IBM Spectrum Scale summary page in Ambari contains a Quick Links menu with an item that opens the IBM Spectrum Scale UI in a new tab. The IBM Spectrum Scale GUI is not installed or configured by Ambari. There is merely a link to the UI if the administrator wants to set it up.

Page 77: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

77/86

If you are running IBM Spectrum Scale 4.2.0 or later, the rpms required to install the GUI are included in Stand-ard and Advanced Editions for Linux on x86 and Power (Big Endian or Little Endian). The GUI requires RHEL 7. Installation instructions are available on IBM Knowledge Center here: https://www-01.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_manualinstallofgui.htm

If you are running IBM Spectrum Scale 4.1, there is an Open Beta of the GUI available here: https://www.ibm.com/developerworks/servicemanagement/tc/gpfs/evaluate.html

Ambari assumes that the GUI is running the same node as the IBM Spectrum Scale Master.

If you are using ESS, then Ambari assumes that the ESS GUI has been installed on the node specified in /var/lib/ambari-server/resources/shared_gpfs_node.cfg.

The host and the port that Ambari links to can be configured by setting gpfs.webui.address in the gpfs-advance configuration. If this value is changed after the initial cluster deployment, refresh the browser window where the Ambari GUI is running so that the change can take effect.

I. Collecting the snap data

It is possible to collect IBM Spectrum Scale snap data from the Ambari GUI. The command is run by the IBM Spectrum Scale Master and the snap data is saved to /var/log/ambari.gpfs.snap.<timestamp> on the IBM Spec-trum Scale Master node.

Page 78: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

78/86

It is also possible to override the default behavior of this snap by providing the arguments to be given to the gpfs.snap command in the file /var/lib/ambari-server/resources/gpfs.snap.args.

By default, the IBM Spectrum Scale Master will run the following command:

/usr/lpp/mmfs/bin/gpfs.snap -d /var/log/ambari.gpfs.snap.<timestamp> -N <all nodes> --check-space --timeout 600

Where <all nodes> is the list of nodes in the IBM Spectrum Scale cluster and in the Ambari cluster. The external nodes in a shared cluster, such as ESS servers, are not included.

If you wanted to override these default arguments, then you would specify the arguments to be passed to gpfs.snap in /var/lib/ambari-server/resources/gpfs.snap.args. For example, if you wanted to write the snap da-ta to a different location, collect snap data from all nodes in the cluster, and increase the timeout. You can pro-vide a gpfs.snap.args file similar to that in the example:

[root@mn01]# cat /var/lib/ambari-server/resources/gpfs.snap.args -d /root/gpfs.snap.out -a --timeout 1200

You can see the output from the snap command and learn which directory the snap data was written to by

looking at the output file from Ambari.

FIGURE 31 AMBARI COLLECT SNAP DATA

Page 79: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

79/86

J. HTTPS/REST API

See References: HTTPS/REST API Hadoop Connector for information on the configuration.

K. Resources

Description URL

Common Problems and Solutions for IBM Spectrum Scale Hadoop Connector

References: Troubleshooting: Hadoop Connector

Latest IBM Spectrum Scale Hadoop Connector References: Hadoop Connector

Page 80: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

80/86

FAQ

What IBM Spectrum Scale edition is required for the Ambari deployment?

If you want to perform a new installation (including cluster creation, file system creation, and so on),

you need to use the standard or advanced edition because the IBM Spectrum Scale Filesystem policy

will be used by default. If you only have the Express Edition, select the Deploy the IOP over existing

IBM Spectrum Scale cluster mode.

Why do I fail in registering the Ambari agent?

You could run ps -elf | grep ambari on the failing agent node to see what it is running. Usually, while

registering in your agent node, there should be nothing under /etc/yum.repos.d/. If there is an addi-

tional repository that does not work because of an incorrect path or yum server address, the Ambari

agent register operation will fail.

Which yum repository must be under /etc/yum.repos.d?

Before registering, on the Ambari server node, under /etc/yum.repos.d, there is only one Ambari re-

pository file that you create in section 3.1. On the Ambari agent, there must be no repository files re-

lated with Ambari. After the Ambari agent has been registered successfully, the Ambari server copies

the Ambari repository to all Ambari agents. After that, the Ambari server creates the IOP and IOP-

UTILS repository over the Ambari server and agents, according to your specification in the Ambari GUI

in section 4.3.

If you interrupt the Ambari deployment, you will have to clean these files before starting up Ambari

the next time, especially when you specify a different IBM Spectrum Scale, IOP, or IOP-UTILS yum URL.

Must all nodes have the same root password?

No, this is unnecessary. You only need to specify the ssh key file for root on the Ambari server.

Why did the Ambari services show failure to start up after the installation?

If you are using hadoop-gpfs-connector-2.7.0.3 or later, you must expect that the installation will be

successful, but the starting of services fails. This is expected and normal due to a recent security en-

hancement introduced into the connector and minor configuration is required. Once configured, all

services must start and pass service check.

Why did the MapReduce services failed?

Page 81: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

81/86

Look for the ambari-qa folder in the DFS user directory. If it does not exist, create it. If this step is

skipped, MapReduce service check will fail due to the /user/ambari-qa path not found error.

As root:

mkdir<gpfs mount>/user/ambari-qa

chown ambari-qa.hadoop /gpfs/hadoopfs/user/ambari-qa

How to check the superuser and the supergroup?

If you are using connector version hadoop-gpfs-2.7.0-3 or later, additional security controls were

added to support multiple user groups. Normally, just one super user “hdfs” and super group “ha-

doop” is used. Control over the IDs that can access the distributed file system via HDFS is controlled

by permissions and ACLs defined on /var/run/ibm_bigpfs_gcd. To see who is superuser/super group:

ls -alt /var/run/ibm_bigpfs_gcd

srw-------. 1 hdfs hadoop 0 Dec 10 21:17 /var/run/ibm_bigpfs_gcd

How to set user permissions in the filesystem?

Create some directories to support the new connector, if they do not already exist

mkdir /var/mmfs/bi; chown hdfs:hadoop /var/mmfs/bi; chmod 660 /var/mmfs/bi

The above assumes that the HDFS superuser is hdfs and supergroup is hadoop

a) To allow a specific set of users to access the DFS via ACLs, perform the following on all nodes:

Note: For HDFS ACL support, install the following RPM packages: acl, libacl to enable Hadoop ACL

support, and libattr to enable Hadoop extended attributes on all nodes

Note: Using fine grained control will require extensive testing for your applications. If a user ID is

not authorized to see the DFS through HDFS APIs, the error will be:

java.io.IOException: GPFSC00023E: Unable to establish communication with file system

at org.apache.hadoop.fs.gpfs.GeneralParallelFileSystem.lockNativeRootAction(GeneralParallelFileSystem.java:2786)

at org.apache.hadoop.fs.gpfs.GeneralParallelFileSystem.getFileStatus(GeneralParallelFileSystem.java:799)

On every node:

yum install -y acl libacl libattr

Page 82: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

82/86

To see the ACLs currently set:

getfacl /var/run/ibm_bigpfs_gcd

# file: ibm_bigpfs_gcd

# owner: root

# group: root

user::rwx

group::---

other::---

b) To allow hdfs (typical super user in HDFS) to have full access to DFS, on all nodes:

setfacl -m "u:hdfs:rwx" /var/run/ibm_bigpfs_gcd

c) To allow any service ID that is a member of hadoop group (e.g. Hadoop service IDs) to have full ac-

cess to DFS, on all nodes:

setfacl -m "g:hadoop:rwx" /var/run/ibm_bigpfs_gcd

Why is the Ambari GUI displaying the Service down message when the service process is active on the

target node?

a) Check whether the file /var/lib/ambari-agent/data/structured-out-status.json has a length of 0 bytes. If it does, remove the structured-out-status.json file.

b) Check the space usage of the file system where the json file resides. Free up space on the file sys-tem if the file system is full.

Page 83: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

83/86

Notices

This information was developed for products and services that are offered in the USA.

IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive, MD-NC119 Armonk, NY 10504-1785 United States of America For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to: Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan Ltd. 19-21, Nihonbashi-Hakozakicho, Chuo-ku Tokyo 103-8510, Japan

The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WAR-RANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRAN-TIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the mate-rials for this IBM product and use of those websites is at your own risk.

Page 84: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

84/86

IBM may use or distribute any of the information you supply in any way it believes appropriate without incur-ring any obligation to you. Licensees of this program who wish to have information about it for the purpose of enabling: (i)the exchange of

information between independently created programs and other programs (including this one) and (ii) the mu-

tual use of the information which has been exchanged, should contact:

IBM Director of Licensing IBM Corporation North Castle Drive, MD-NC119 Armonk, NY 10504-1785 US Such information may be available, subject to appropriate terms and conditions, including in some cases, pay-ment of a fee. The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same ongeneral-ly available systems. Furthermore, some measurements may have been estimated through extrapolation. Ac-tual results may vary. Users of this document should verify the applicable data for their specific environment. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the ca-pabilities of non-IBM products should be addressed to the suppliers of those products. Statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual businessenterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming tech-niques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing applicationpro-grams conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be liable for any damages arising out of your use of the sample programs.

Page 85: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

85/86

Each copy or any portion of these sample programs or any derivative work, must include a copyright notice as

follows:

Portions of this code are derived from IBM Corp. Sample Programs. © Copyright IBM Corp. 2016. All rights reserved.

Trademarks

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at "Copyright and trademark information" (www.ibm.com/legal/copytrade.shtml). Java™ and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

Terms and conditions for product documentation Permissions for the use of these publications are granted subject to the following terms and conditions. Applicability These terms and conditions are in addition to any terms of use for the IBM website. Personal use You may reproduce these publications for your personal, noncommercial use provided that all proprietary no-tices are preserved. You may not distribute, display or make derivative work of these publications, or any por-tion thereof, without the express consent of IBM. Commercial use You may reproduce, distribute and display these publications solely within your enterprise provided that all proprietary notices are preserved. You may not make derivative works of these publications, or reproduce, dis-tribute or display these publications or any portion thereof outside your enterprise, without the express con-sent of IBM. Rights Except as expressly granted in this permission, no other permissions, licenses or rights are granted, either ex-press or implied, to the publications or any information, data, software or other intellectual property contained therein. IBM reserves the right to withdraw the permissions granted herein whenever, in its discretion, the use of the

publications is detrimental to its interest or, as determined by IBM, the above instructions are not being

properly followed.

You may not download, export or re-export this information except in full compliance with all applicable laws and regulations, including all United States export laws and regulations.

Page 86: Deployment Guide: IBM® BigInsights™with IBM® Spectrum Scale

86/86

IBM MAKES NO GUARANTEE ABOUT THE CONTENT OF THESE PUBLICATIONS. THE PUBLICATIONS ARE PROVID-

ED "AS-IS" AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT

LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT, AND FITNESS FOR A PARTICU-

LAR PURPOSE.

IBM Online Privacy Statement IBM Software products, including software as a service solutions, (“Software Offerings”) may use cookies or other technologies to collect product usage information, to help improve the end user experience, to tailor in-teractions with the end user, or for other purposes. In many cases no personally identifiable information is col-lected by the Software Offerings. Some of our Software Offerings can help enable you to collect personally identifiable information. If this Software Offering uses cookies to collect personally identifiable information, specific information about this offering’s use of cookies is set forth below. This Software Offering does not use cookies or other technologies to collect personally identifiable information. If the configurations deployed for this Software Offering provide you as customer the ability to collect person-ally identifiable information from end users via cookies and other technologies, you should seek your own legal advice about any laws applicable to such data collection, including any requirements for notice and consent. For more information about the use of various technologies, including cookies, for these purposes, see IBM’s Privacy Policy at http://www.ibm.com/privacy and IBM’s Online Privacy Statement at http://www.ibm.com/privacy/details in the section entitled “Cookies, Web Beacons and Other Technologies”, and the “IBM Software Products and Software-as-a-Service Privacy Statement” at http://www.ibm.com/software/info/product-privacy.