hadoop cluster - basic os setup insights

Download Hadoop Cluster - Basic OS Setup Insights

Post on 11-Apr-2017




2 download

Embed Size (px)


Hadoop Cluster Design thoughts

Hadoop Cluster - Basic OS Setup InsightsPresented by - Sruthi Kumar AnnamniduHadoop Administratorsruthi.kumar.a@gmail.com@SkumarAnnamniduhttp://sruthikumara.wordpress.com/ Hadoop Cluster - Basic OS Setup Insights1


Hosting IaaS provider based out of Denver, CO

Started in 2008 Primarily VMware/DR

6 data centers offering VMware (3 Hadoop)

Over 3000 VMs in production with over 50TB of DataHadoop Cluster - Basic OS Setup Insights2 BitRefinery

Goal End Result Infrastructure Process Features to consider Public Key Authentication (PKA)Hadoop Cluster - Basic OS Setup Insights3Agenda

Snapshot / Recovery Other features CDH InstallationHadoop StackQuestionsHadoop Cluster - Basic OS Setup Insights4Agenda

To store data in Hadoop Distributed File System (HDFS), analyze unstructured / structured data, and to evaluate Hadoop ecosystem Hadoop Cluster - Basic OS Setup Insights5Goal

Create Hadoop Cluster with multiple Servers (Nodes) and install services like Cloudera Manager, Hadoop, Hive, Sqoop, Oozie, etc on the clusterHadoop Cluster - Basic OS Setup Insights6End Result

Multiple physical servers with different specifications

Example: i3-i7 with 1.6-3.4 GHz CPU, 2-4 cores, 8-32gb RAM, 500gb -2X1 TB Hard DrivesHadoop Cluster - Basic OS Setup Insights7Infrastructure - Hardware

All physical servers have same operating system CentOS 6.5Hadoop Cluster - Basic OS Setup Insights8Infrastructure - Software

Operating System CentOS (via USB)

Insert flash drive to the server -> select Install -> English -> U.S. English -> Basic Storage Devices -> key in Hostname cdh01-01, cdh01-02 etc -> Select America/Denver for time zone -> key in root password -> hit Use Anyway if prompted -> select Use All Space -> select all drives and move them to install target devices section using right arrow ..contHadoop Cluster - Basic OS Setup Insights9Process

Operating System CentOS (via USB)

select write changes to disk -> Close -> turn off the server -> Remove flash drive -> restart the server -> forward on welcome screen -> yes to license information -> key in username as cdh and same password on all server -> check Synchronize date option -> Forward -> leave defaults on Kdump screen -> Hit finish -> yes on changing kdump screen -> ok to reboot ..contHadoop Cluster - Basic OS Setup Insights10 Process Cont

Operating System CentOS (via USB)enabled=0/etc/yum/pluginconf.d/fastestmirror.confNavigate to System -> Adminstration -> Software Update to update the software (make sure you are not updating to CentOS 7 version) -> verify the internet connectivity from firefox browser -> as root user issue init 0 or shutdown -h now to shut down the server Hadoop Cluster - Basic OS Setup Insights11 Process Cont

Operating System CentOS (via USB)

HostnameUser (same on all Nodes)enabled=0/etc/yum/pluginconf.d/fastestmirror.confUpdates (Glib Library, Fonts, YUM backend) Initial setupHadoop Cluster - Basic OS Setup Insights12 Process Cont

Update Network Connections (ip address etc)

From System menu -> Preferences -> Network Connections -> select Auto eth0 -> Edit -> check connect automatically -> IPv4 Settings -> select method as Manual -> add -> Address as 192.168.a.b -> Netmask 255.255.c.d (on all servers) -> Gateway as 192.e.f.g (on all servers) -> DNS Servers as 192.h.i.j (on all servers) -> Apply -> issue service network restart

Hadoop Cluster - Basic OS Setup Insights13Process - Initial Setup

Add user to root group (usermod -G root )groupsInstall openssh-server (yum install openssh-server)as root, issue chkconfig sshd on and issue service sshd restartChange hostname to actual hostname from default in /etc/sysconfig/networkLocalhost.localdomainadd GATEWAY= 192.e.f.g as last lineissue service network restartUpdate /etc/hosts with ip address, Host Name, alias (optional)SELINUX=disabled (from enforcing) in /etc/sysconfig/selinux issue service network restartReboot the node

Hadoop Cluster - Basic OS Setup Insights14Process - Initial Setup Cont

Disable ipv6 (/etc/sysctl.conf )

#To disable ipv6 - added on by #Notice that 0 is for disable and 1 is for enablenet.ipv6.conf.all.disable_ipv6 = 0net.ipv6.conf.default.disable_ipv6 = 0net.ipv6.conf.lo.disable_ipv6 = 0

issue service network restart

Hadoop Cluster - Basic OS Setup Insights15Process - Initial Setup Cont

Blacklist ipv6

#to disabale ipv6 - added by on blacklist ipv6blacklist net-pf-10

Issue commandsIssue service ip6tables stopIssue chkconfig ip6tables offissue service network restart

Hadoop Cluster - Basic OS Setup Insights16Process - Initial Setup Cont

Provide user with sudo access (visudo)

Add %adm ALL=(ALL) NOPAWWD:ALL after #%wheel line of same thing without a password sectionAdd cdh ALL=(ALL) ALL after root line of Allow root to run any commands anywhere section

Issue commandsservice iptables stopchkconfig iptables off

Hadoop Cluster - Basic OS Setup Insights17Process - Initial Setup Cont

Set Boot proto to none in /etc/sysconfig/network-scripts/ifcfg-Auto_eth0Set networking_ipv6 = no in /etc/sysconfig/networkvm.swappiness=10 (/etc/sysctl.conf) Update /etc/hosts file on all Nodes

Hadoop Cluster - Basic OS Setup Insights18Process - Initial Setup Cont

Java installationAfter downloading from Oracle website, issue commandsrpm -ivh .rpmrpm -Uvh .rpm For each user, .bash_profile must be updatedecho $JAVA_HOMEHadoop Cluster - Basic OS Setup Insights19Process - Initial Setup Cont

Installing CentoOS 6.5 from flash drive USB2.0 usb drive in USB3.0 port on Server issue Disabling ipv6 in CentOS 6.5 0 is to disable and 1 is enable in CentOS 6.5 History Command export HISTTIMEFORMAT='%F %T to .bash_profile Mount NTFS external hard drive Install Samba client, cifs-utils Hadoop Cluster - Basic OS Setup Insights20Features to consider

Dhana Annamnidu (DA) - NTFS Needs FUSE File System Check extra repo (or so) check EPEL or extras (or so)

Generally samba, cifs-utils are used to connect to windows FS cifs is a protocol aka SMBAlso called as passwordless authentication Ping Ping Ping

Hadoop Cluster - Basic OS Setup Insights21Public Key Authentication (PKA)

Passwordless authentication or PKA 3 main steps Generate SSH keys Copy Public Key SSHHadoop Cluster - Basic OS Setup Insights22Public Key Authentication (PKA) Cont

Generate SSH keys ssh-keygen -t rsa -P pass phrase .ssh directory Public Key (id_rsa.pub) Private Key (id_rsa)Hadoop Cluster - Basic OS Setup Insights23Public Key Authentication (PKA) Cont

drwx------ 2 root root 4096 Aug 29 22:33 .ssh -rw------- 1 root root 1675 Aug 29 22:33 id_rsa -rw-r--r-- 1 root root 395 Aug 29 22:33 id_rsa.pub

Hadoop Cluster - Basic OS Setup Insights24Public Key Authentication (PKA) Cont

Hadoop Cluster - Basic OS Setup Insights25Public Key Authentication (PKA) Cont

Copy Public Key ssh-copy-id -i //id_rsa.pub @ authorized_keys known_hosts Adds only first 2 entries from /etc/hostsHadoop Cluster - Basic OS Setup Insights26Public Key Authentication (PKA) Cont

Dhana Annamnidu (DA) - Talk about: /root/.ssh for root user. For User1, /home/user1/.ssh

-rw------- 1 root root 395 Aug 29 23:45 authorized_keys -rw-r--r-- 1 root root 404 Aug 29 23:45 known_hosts

Hadoop Cluster - Basic OS Setup Insights27Public Key Authentication (PKA) Cont

SSH ssh Passwordless entry ssh Passwordless entry ssh Warning: Permanently added, key password Passwordless entry

Hadoop Cluster - Basic OS Setup Insights28Public Key Authentication (PKA) Cont

ssh @ ssh @ssh @ssh ssh ssh Hadoop Cluster - Basic OS Setup Insights29Public Key Authentication (PKA) Cont

Key Pair issue permission denied (publickey gssapi-keyex gssapi-with-mic)

Hadoop Cluster - Basic OS Setup Insights30

Public Key Authentication (PKA) Cont

Key Pair issue - NOT preferred resolutionHadoop Cluster - Basic OS Setup Insights31

Public Key Authentication (PKA) Cont

Hadoop Cluster - Basic OS Setup Insights32Public Key Authentication (PKA) Cont

Error: Permission denied, please try again

Snapshot of Nodes Used Acronis True Image 2012 Around 12 hours to take snapshot (example, 1TB to 380GB)Full backup method, Normal compression level Recovery of Snapshots Used Acronis True Image 2012 Around 12 hours to RecoverRecover whole disks and partitions Connected to LAN Verified - Network Connections, /etc/hosts, Ping, SSH, reboot history

Hadoop Cluster - Basic OS Setup Insights33Snapshot /Recovery

Clonezilla FSArchiver All Partitions and MBR OpenZFS Kickstart NIC information LVM (Logical Volume Manager) NTP (Network Time Protocol)

Hadoop Cluster - Basic OS Setup Insights34Other Features

Accessing all Nodes from remote locations What is my ip Unix VNC Copied folder structure find / -ls >

Hadoop Cluster - Basic OS Setup Insights35Other Features Cont

CDH5.4.0 installation Cloudera Express HDFS, HBase, ZoopKeeper, Oozie, Hive, Hue, Flume, Cloudera Impala, Sentry, Sqoop, Cloudera Search, and Spark Kafka, Sqoop Netezza Connector, Sqoop Teradata Connector Other services like YARN, Hcatalog

Hadoop Clus


View more >