Download - Hadoop Cluster - Basic OS Setup Insights
1
Hadoop Cluster - Basic OS Setup Insights
Presented by - Sruthi Kumar AnnamniduHadoop Administrator
[email protected]@SkumarAnnamnidu
http://sruthikumara.wordpress.com/
Hadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup Insights 2
Hosting IaaS provider based out of Denver, CO
Started in 2008 – Primarily VMware/DR
6 data centers offering VMware (3 Hadoop)
Over 3000 VMs in production with over 50TB of Data
BitRefinery
Hadoop Cluster - Basic OS Setup Insights 3
Goal End Result Infrastructure Process Features to consider Public Key Authentication (PKA)
Agenda
Hadoop Cluster - Basic OS Setup Insights 4
Snapshot / Recovery Other features CDH InstallationHadoop StackQuestions
Agenda
Hadoop Cluster - Basic OS Setup Insights 5
To store data in Hadoop Distributed File System (HDFS), analyze unstructured / structured data, and to evaluate Hadoop ecosystem
Goal
Hadoop Cluster - Basic OS Setup Insights 6
Create Hadoop Cluster with multiple Servers (Nodes) and install services like Cloudera Manager, Hadoop, Hive, Sqoop, Oozie, etc on the cluster
End Result
Hadoop Cluster - Basic OS Setup Insights 7
Multiple physical servers with different specifications
Example: i3-i7 with 1.6-3.4 GHz CPU, 2-4 cores, 8-32gb RAM, 500gb -2X1 TB Hard Drives
Infrastructure - Hardware
Hadoop Cluster - Basic OS Setup Insights 8
All physical servers have same operating system – CentOS 6.5
Infrastructure - Software
Hadoop Cluster - Basic OS Setup Insights 9
Operating System CentOS (via USB)
Insert flash drive to the server -> select Install -> English -> U.S. English -> Basic Storage Devices -> key in Hostname cdh01-01, cdh01-02 etc -> Select America/Denver for time zone -> key in root password -> hit ‘Use Anyway’ if prompted -> select Use All Space -> select all drives and move them to install target devices section using right arrow ..cont…
Process
Hadoop Cluster - Basic OS Setup Insights 10
Operating System CentOS (via USB)
select write changes to disk -> Close -> turn off the server -> Remove flash drive -> restart the server -> forward on welcome screen -> yes to license information -> key in username as cdh and same password on all server -> check Synchronize date… option -> Forward -> leave defaults on Kdump screen -> Hit finish -> yes on changing kdump… screen -> ok to reboot
..cont…
Process Cont…
Hadoop Cluster - Basic OS Setup Insights 11
Operating System CentOS (via USB)enabled=0
◦/etc/yum/pluginconf.d/fastestmirror.confNavigate to System -> Adminstration -> Software Update to update the software (make sure you are not updating to CentOS 7 version) -> verify the internet connectivity from firefox browser -> as root user issue ‘init 0’ or ‘shutdown -h now’ to shut down the server
Process Cont…
Hadoop Cluster - Basic OS Setup Insights 12
Operating System CentOS (via USB)
Hostname User (same on all Nodes) enabled=0
◦ /etc/yum/pluginconf.d/fastestmirror.conf Updates (Glib Library, Fonts, YUM backend)
◦ Initial setup
Process Cont…
Hadoop Cluster - Basic OS Setup Insights 13
Update Network Connections (ip address etc)
◦From System menu -> Preferences -> Network Connections -> select Auto eth0 -> Edit -> check ‘connect automatically’ -> IPv4 Settings -> select method as ‘Manual’ -> add -> Address as ‘192.168.a.b’ -> Netmask ‘255.255.c.d’ (on all servers) -> Gateway as ‘192.e.f.g’ (on all servers) -> DNS Servers as ‘192.h.i.j’ (on all servers) -> Apply -> issue ‘service network restart’
Process - Initial Setup
Hadoop Cluster - Basic OS Setup Insights 14
Add user to root group (usermod -G root <UN>)◦ groups
Install openssh-server (yum install openssh-server)◦ as root, issue ‘chkconfig sshd on’ and issue ‘service sshd restart’
Change hostname to actual hostname from default in /etc/sysconfig/network◦ Localhost.localdomain◦ add ‘GATEWAY= 192.e.f.g’ as last line◦ issue ‘service network restart’
Update /etc/hosts with ip address, Host Name, alias (optional) SELINUX=disabled (from enforcing) in /etc/sysconfig/selinux
◦ issue ‘service network restart’◦ Reboot the node
Process - Initial Setup Cont…
Hadoop Cluster - Basic OS Setup Insights 15
Disable ipv6 (/etc/sysctl.conf ) #To disable ipv6 - added on <DATE> by <Who>#Notice that 0 is for disable and 1 is for enablenet.ipv6.conf.all.disable_ipv6 = 0net.ipv6.conf.default.disable_ipv6 = 0net.ipv6.conf.lo.disable_ipv6 = 0 issue ‘service network restart’
Process - Initial Setup Cont…
Hadoop Cluster - Basic OS Setup Insights 16
Blacklist ipv6#to disabale ipv6 - added by <> on <Date>blacklist ipv6blacklist net-pf-10
Issue commands◦Issue service ip6tables stop◦Issue chkconfig ip6tables off◦issue ‘service network restart’
Process - Initial Setup Cont…
Hadoop Cluster - Basic OS Setup Insights 17
Provide user with sudo access (visudo)
◦Add ‘%adm ALL=(ALL) NOPAWWD:ALL’ after ‘#%wheel’ line of ‘same thing without a password’ section
◦Add ‘cdh ALL=(ALL) ALL’ after ‘root’ line of ‘Allow root to run any commands anywhere’ section
Issue commands◦service iptables stop◦chkconfig iptables off
Process - Initial Setup Cont…
Hadoop Cluster - Basic OS Setup Insights 18
Set Boot proto to ‘none’ in /etc/sysconfig/network-scripts/ifcfg-Auto_eth0
Set networking_ipv6 = no in /etc/sysconfig/network
vm.swappiness=10 (/etc/sysctl.conf) Update /etc/hosts file on all Nodes
Process - Initial Setup Cont…
Hadoop Cluster - Basic OS Setup Insights 19
Java installation◦After downloading from Oracle website, issue commands rpm -ivh <>.rpm rpm -Uvh <>.rpm
◦For each user, .bash_profile must be updated
◦echo $JAVA_HOME
Process - Initial Setup Cont…
Hadoop Cluster - Basic OS Setup Insights 20
Installing CentoOS 6.5 from flash drive ◦ USB2.0 usb drive in USB3.0 port on Server issue
Disabling ipv6 in CentOS 6.5◦ 0 is to disable and 1 is enable in CentOS 6.5
History Command◦ “export HISTTIMEFORMAT='%F %T” to .bash_profile
Mount NTFS external hard drive◦ Install Samba client, cifs-utils
Features to consider
Hadoop Cluster - Basic OS Setup Insights 21
Also called as passwordless authentication Ping <ip address> Ping <Host Name> Ping <second alias>
Public Key Authentication (PKA)
Hadoop Cluster - Basic OS Setup Insights 22
Passwordless authentication or PKA◦ 3 main steps Generate SSH keys Copy Public Key SSH
Public Key Authentication (PKA) Cont…
Hadoop Cluster - Basic OS Setup Insights 23
Generate SSH keys◦ ssh-keygen -t rsa -P “pass phrase”
.ssh directory Public Key (id_rsa.pub) Private Key (id_rsa)
Public Key Authentication (PKA) Cont…
Hadoop Cluster - Basic OS Setup Insights 24
drwx------ 2 root root 4096 Aug 29 22:33 .ssh
-rw------- 1 root root 1675 Aug 29 22:33 id_rsa
-rw-r--r-- 1 root root 395 Aug 29 22:33 id_rsa.pub
Public Key Authentication (PKA) Cont…
Hadoop Cluster - Basic OS Setup Insights 25
Public Key Authentication (PKA) Cont…
Hadoop Cluster - Basic OS Setup Insights 26
Copy Public Key◦ ssh-copy-id -i /<>/id_rsa.pub <UN>@<HN> authorized_keys known_hosts
Adds only first 2 entries from /etc/hosts
Public Key Authentication (PKA) Cont…
Hadoop Cluster - Basic OS Setup Insights 27
-rw------- 1 root root 395 Aug 29 23:45 authorized_keys
-rw-r--r-- 1 root root 404 Aug 29 23:45 known_hosts
Public Key Authentication (PKA) Cont…
Hadoop Cluster - Basic OS Setup Insights 28
SSH◦ ssh <ip address>
Passwordless entry◦ ssh <Host Name from /etc/hosts>
Passwordless entry◦ ssh <Third entry from /etc/hosts >
Warning: Permanently added, key password Passwordless entry
Public Key Authentication (PKA) Cont…
Hadoop Cluster - Basic OS Setup Insights 29
ssh <UN>@<ip address> ssh <UN>@<Host Name >ssh <UN>@<Third Entry>ssh <ip address>ssh <Host Name> ssh <Third Entry>
Public Key Authentication (PKA) Cont…
Hadoop Cluster - Basic OS Setup Insights 30
Key Pair issue◦ permission denied (publickey gssapi-keyex gssapi-with-mic)
Public Key Authentication (PKA) Cont…
Hadoop Cluster - Basic OS Setup Insights 31
Key Pair issue - NOT preferred resolution
Public Key Authentication (PKA) Cont…
Hadoop Cluster - Basic OS Setup Insights 32
Public Key Authentication (PKA) Cont…
Error: Permission denied, please try again
Hadoop Cluster - Basic OS Setup Insights 33
Snapshot of Nodes◦ Used ‘Acronis True Image 2012’
Around 12 hours to take snapshot (example, 1TB to 380GB) Full backup method, Normal compression level
Recovery of Snapshots◦ Used ‘Acronis True Image 2012’
Around 12 hours to Recover Recover whole disks and partitions
Connected to LAN◦ Verified - Network Connections, /etc/hosts, Ping, SSH, reboot history
Snapshot /Recovery
Hadoop Cluster - Basic OS Setup Insights 34
Clonezilla FSArchiver◦ All Partitions and MBR
OpenZFS Kickstart NIC information LVM (Logical Volume Manager) NTP (Network Time Protocol)
Other Features
Hadoop Cluster - Basic OS Setup Insights 35
Accessing all Nodes from remote locations◦ What is my ip◦ Unix VNC
Copied folder structure◦ ‘find / -ls > <Location>
Other Features Cont…
Hadoop Cluster - Basic OS Setup Insights 36
CDH5.4.0 installation◦ Cloudera Express
HDFS, HBase, ZoopKeeper, Oozie, Hive, Hue, Flume, Cloudera Impala, Sentry, Sqoop, Cloudera Search, and Spark
Kafka, Sqoop Netezza Connector, Sqoop Teradata Connector
Other services like – YARN, Hcatalog
CDH Installation
Hadoop Cluster - Basic OS Setup Insights 37
CDH Installation Cont…
Hadoop Cluster - Basic OS Setup Insights 38
All these services were installed because, ‘Core Hadoop’ option was selected in the ‘Cluster Setup’ page
CDH Installation Cont…
Hadoop Cluster - Basic OS Setup Insights 39
Hadoop Stack
http://vision.cloudera.com/cloudera-to-release-first-recursive-hadoop-stack/
Spark-on-Hive-on-MapReduce-on-Spark-on-Oozie-on-HBase-on-Hive-on-Spark-on-Flume-on-HDFS-on-Impala-on-Spark-on-Hive-on-MapReduce-on-Spark-on-Oozie-on-HBase-on-Hive-on-Spark-on-Flume-on-HDFS-on-Impala-on-Spark-on-Hive-on-MapReduce-on-Spark-on-Oozie-on-HBase-on-Hive-on-Spark-on-Flume-on-HDFS-on-Impala-on-…
Hadoop Cluster - Basic OS Setup Insights 40
Hadoop Cluster - Basic OS Setup Insights