hadoop cluster - basic os setup insights

40
Hadoop Cluster - Basic OS Setup Insights Presented by - Sruthi Kumar Annamnidu Hadoop Administrator s [email protected] @ SkumarAnnamnidu http://sruthikumara.wordpress.com / Hadoop Cluster - Basic OS Setup Insights 1

Upload: sruthi-kumar-annamnidu

Post on 11-Apr-2017

148 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Hadoop Cluster - Basic OS Setup Insights

1

Hadoop Cluster - Basic OS Setup Insights

Presented by - Sruthi Kumar AnnamniduHadoop Administrator

[email protected]@SkumarAnnamnidu

http://sruthikumara.wordpress.com/

Hadoop Cluster - Basic OS Setup Insights

Page 2: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 2

Hosting IaaS provider based out of Denver, CO

Started in 2008 – Primarily VMware/DR

6 data centers offering VMware (3 Hadoop)

Over 3000 VMs in production with over 50TB of Data

BitRefinery

Page 3: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 3

Goal End Result Infrastructure Process Features to consider Public Key Authentication (PKA)

Agenda

Page 4: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 4

Snapshot / Recovery Other features CDH InstallationHadoop StackQuestions

Agenda

Page 5: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 5

To store data in Hadoop Distributed File System (HDFS), analyze unstructured / structured data, and to evaluate Hadoop ecosystem

Goal

Page 6: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 6

Create Hadoop Cluster with multiple Servers (Nodes) and install services like Cloudera Manager, Hadoop, Hive, Sqoop, Oozie, etc on the cluster

End Result

Page 7: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 7

Multiple physical servers with different specifications

Example: i3-i7 with 1.6-3.4 GHz CPU, 2-4 cores, 8-32gb RAM, 500gb -2X1 TB Hard Drives

Infrastructure - Hardware

Page 8: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 8

All physical servers have same operating system – CentOS 6.5

Infrastructure - Software

Page 9: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 9

Operating System CentOS (via USB)

Insert flash drive to the server -> select Install -> English -> U.S. English -> Basic Storage Devices -> key in Hostname cdh01-01, cdh01-02 etc -> Select America/Denver for time zone -> key in root password -> hit ‘Use Anyway’ if prompted -> select Use All Space -> select all drives and move them to install target devices section using right arrow ..cont…

Process

Page 10: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 10

Operating System CentOS (via USB)

select write changes to disk -> Close -> turn off the server -> Remove flash drive -> restart the server -> forward on welcome screen -> yes to license information -> key in username as cdh and same password on all server -> check Synchronize date… option -> Forward -> leave defaults on Kdump screen -> Hit finish -> yes on changing kdump… screen -> ok to reboot

..cont…

Process Cont…

Page 11: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 11

Operating System CentOS (via USB)enabled=0

◦/etc/yum/pluginconf.d/fastestmirror.confNavigate to System -> Adminstration -> Software Update to update the software (make sure you are not updating to CentOS 7 version) -> verify the internet connectivity from firefox browser -> as root user issue ‘init 0’ or ‘shutdown -h now’ to shut down the server

Process Cont…

Page 12: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 12

Operating System CentOS (via USB)

Hostname User (same on all Nodes) enabled=0

◦ /etc/yum/pluginconf.d/fastestmirror.conf Updates (Glib Library, Fonts, YUM backend)

◦ Initial setup

Process Cont…

Page 13: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 13

Update Network Connections (ip address etc)

◦From System menu -> Preferences -> Network Connections -> select Auto eth0 -> Edit -> check ‘connect automatically’ -> IPv4 Settings -> select method as ‘Manual’ -> add -> Address as ‘192.168.a.b’ -> Netmask ‘255.255.c.d’ (on all servers) -> Gateway as ‘192.e.f.g’ (on all servers) -> DNS Servers as ‘192.h.i.j’ (on all servers) -> Apply -> issue ‘service network restart’

Process - Initial Setup

Page 14: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 14

Add user to root group (usermod -G root <UN>)◦ groups

Install openssh-server (yum install openssh-server)◦ as root, issue ‘chkconfig sshd on’ and issue ‘service sshd restart’

Change hostname to actual hostname from default in /etc/sysconfig/network◦ Localhost.localdomain◦ add ‘GATEWAY= 192.e.f.g’ as last line◦ issue ‘service network restart’

Update /etc/hosts with ip address, Host Name, alias (optional) SELINUX=disabled (from enforcing) in /etc/sysconfig/selinux

◦ issue ‘service network restart’◦ Reboot the node

Process - Initial Setup Cont…

Page 15: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 15

Disable ipv6 (/etc/sysctl.conf ) #To disable ipv6 - added on <DATE> by <Who>#Notice that 0 is for disable and 1 is for enablenet.ipv6.conf.all.disable_ipv6 = 0net.ipv6.conf.default.disable_ipv6 = 0net.ipv6.conf.lo.disable_ipv6 = 0 issue ‘service network restart’

Process - Initial Setup Cont…

Page 16: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 16

Blacklist ipv6#to disabale ipv6 - added by <> on <Date>blacklist ipv6blacklist net-pf-10

Issue commands◦Issue service ip6tables stop◦Issue chkconfig ip6tables off◦issue ‘service network restart’

Process - Initial Setup Cont…

Page 17: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 17

Provide user with sudo access (visudo)

◦Add ‘%adm ALL=(ALL) NOPAWWD:ALL’ after ‘#%wheel’ line of ‘same thing without a password’ section

◦Add ‘cdh ALL=(ALL) ALL’ after ‘root’ line of ‘Allow root to run any commands anywhere’ section

Issue commands◦service iptables stop◦chkconfig iptables off

Process - Initial Setup Cont…

Page 18: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 18

Set Boot proto to ‘none’ in /etc/sysconfig/network-scripts/ifcfg-Auto_eth0

Set networking_ipv6 = no in /etc/sysconfig/network

vm.swappiness=10 (/etc/sysctl.conf) Update /etc/hosts file on all Nodes

Process - Initial Setup Cont…

Page 19: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 19

Java installation◦After downloading from Oracle website, issue commands rpm -ivh <>.rpm rpm -Uvh <>.rpm

◦For each user, .bash_profile must be updated

◦echo $JAVA_HOME

Process - Initial Setup Cont…

Page 20: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 20

Installing CentoOS 6.5 from flash drive ◦ USB2.0 usb drive in USB3.0 port on Server issue

Disabling ipv6 in CentOS 6.5◦ 0 is to disable and 1 is enable in CentOS 6.5

History Command◦ “export HISTTIMEFORMAT='%F %T” to .bash_profile

Mount NTFS external hard drive◦ Install Samba client, cifs-utils

Features to consider

Dhana Annamnidu
NTFS Needs FUSE File System – Check extra repo (or so) – check EPEL or extras (or so)Generally samba, cifs-utils are used to connect to windows FS – cifs is a protocol aka SMB
Page 21: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 21

Also called as passwordless authentication Ping <ip address> Ping <Host Name> Ping <second alias>

Public Key Authentication (PKA)

Page 22: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 22

Passwordless authentication or PKA◦ 3 main steps Generate SSH keys Copy Public Key SSH

Public Key Authentication (PKA) Cont…

Page 23: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 23

Generate SSH keys◦ ssh-keygen -t rsa -P “pass phrase”

.ssh directory Public Key (id_rsa.pub) Private Key (id_rsa)

Public Key Authentication (PKA) Cont…

Page 24: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 24

drwx------   2 root root  4096 Aug 29 22:33 .ssh

-rw-------  1 root root 1675 Aug 29 22:33 id_rsa

-rw-r--r--  1 root root  395 Aug 29 22:33 id_rsa.pub

Public Key Authentication (PKA) Cont…

Page 25: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 25

Public Key Authentication (PKA) Cont…

Page 26: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 26

Copy Public Key◦ ssh-copy-id -i /<>/id_rsa.pub <UN>@<HN> authorized_keys known_hosts

Adds only first 2 entries from /etc/hosts

Public Key Authentication (PKA) Cont…

Dhana Annamnidu
Talk about: /root/.ssh for root user. For User1, /home/user1/.ssh
Page 27: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 27

-rw-------  1 root root  395 Aug 29 23:45 authorized_keys

-rw-r--r--  1 root root  404 Aug 29 23:45 known_hosts

Public Key Authentication (PKA) Cont…

Page 28: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 28

SSH◦ ssh <ip address>

Passwordless entry◦ ssh <Host Name from /etc/hosts>

Passwordless entry◦ ssh <Third entry from /etc/hosts >

Warning: Permanently added, key password Passwordless entry

Public Key Authentication (PKA) Cont…

Page 29: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 29

ssh <UN>@<ip address> ssh <UN>@<Host Name >ssh <UN>@<Third Entry>ssh <ip address>ssh <Host Name> ssh <Third Entry>

Public Key Authentication (PKA) Cont…

Page 30: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 30

Key Pair issue◦ permission denied (publickey gssapi-keyex gssapi-with-mic)

Public Key Authentication (PKA) Cont…

Page 31: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 31

Key Pair issue - NOT preferred resolution

Public Key Authentication (PKA) Cont…

Page 32: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 32

Public Key Authentication (PKA) Cont…

Error: Permission denied, please try again

Page 33: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 33

Snapshot of Nodes◦ Used ‘Acronis True Image 2012’

Around 12 hours to take snapshot (example, 1TB to 380GB) Full backup method, Normal compression level

Recovery of Snapshots◦ Used ‘Acronis True Image 2012’

Around 12 hours to Recover Recover whole disks and partitions

Connected to LAN◦ Verified - Network Connections, /etc/hosts, Ping, SSH, reboot history

Snapshot /Recovery

Page 34: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 34

Clonezilla FSArchiver◦ All Partitions and MBR

OpenZFS Kickstart NIC information LVM (Logical Volume Manager) NTP (Network Time Protocol)

Other Features

Page 35: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 35

Accessing all Nodes from remote locations◦ What is my ip◦ Unix VNC

Copied folder structure◦ ‘find / -ls > <Location>

Other Features Cont…

Page 36: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 36

CDH5.4.0 installation◦ Cloudera Express

HDFS, HBase, ZoopKeeper, Oozie, Hive, Hue, Flume, Cloudera Impala, Sentry, Sqoop, Cloudera Search, and Spark

Kafka, Sqoop Netezza Connector, Sqoop Teradata Connector

Other services like – YARN, Hcatalog

CDH Installation

Page 37: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 37

CDH Installation Cont…

Page 38: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 38

All these services were installed because, ‘Core Hadoop’ option was selected in the ‘Cluster Setup’ page

CDH Installation Cont…

Page 39: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 39

Hadoop Stack

http://vision.cloudera.com/cloudera-to-release-first-recursive-hadoop-stack/

Spark-on-Hive-on-MapReduce-on-Spark-on-Oozie-on-HBase-on-Hive-on-Spark-on-Flume-on-HDFS-on-Impala-on-Spark-on-Hive-on-MapReduce-on-Spark-on-Oozie-on-HBase-on-Hive-on-Spark-on-Flume-on-HDFS-on-Impala-on-Spark-on-Hive-on-MapReduce-on-Spark-on-Oozie-on-HBase-on-Hive-on-Spark-on-Flume-on-HDFS-on-Impala-on-…

Page 40: Hadoop Cluster - Basic OS Setup Insights

Hadoop Cluster - Basic OS Setup Insights 40

Hadoop Cluster - Basic OS Setup Insights