Download - Tech Data Cloudera on Azure
Cloudera on Microsoft Azure - Step-by-Step Page 1
When8.2
Classified as Microsoft General
Tech Data
Cloudera on Azure
Cloudera on Microsoft Azure - Step-by-Step Page 2
Contents 1 Tech Data Cloudera on Azure Step by Step ................................................................................3
1.0 Things to know prior to using this Guide ........................................................................................ 3
1.1 Cloudera on Azure deployment...................................................................................................... 4
1.2 How to connect.............................................................................................................................. 17
1.3 Post-Deployment Tasks.................................................................................................................. 22
2 Architecture Notions ……………………………………………………………………................................................28
Cloudera on Microsoft Azure - Step-by-Step Page 3
1.0 Things to know prior to using this Guide • You would need to familiarize yourself with this document prior to diving in.
• All the Screen Shots in this Guide are for reference only.
• This Guide will assist you with the deployment of the Cloudera Bundle in an Azure CSP
subscription that was purchased through the StreamOne Portal.
o In-depth training on Azure is outside of this guide.
• Accessing the Cloudera in Azure bundle
o You would need to login to the Azure portal to get the IP address
▪ https://portal.azure.com
▪ You would need to login using the same username and password as the
one created in StreamOne and what was emailed to you.
• For example: [email protected]
• It will give you a one-time password and you will need to change it.
▪ To access the Cloudera Platform, you must ensure you have the Login
and Password that were created during the StreamOne ordering
process.
▪ If you were not the person who accessed the StreamOne ordering portal
to do the purchasing, please get with that person and obtain the user
login and password that were initially created.
▪ If you need to access the underlying VMs, you will need the SSH key or
the password created during the ordering process.
SSH Private Key:
You will be given your Private SSH Key during the order of your Cloudera Platform.
Please make sure you secure this key and store it in a safe place as you might need it
for SSH access to any of your instances. Your key will be displayed only once and
there is no way to recover it later on. For security reasons, Tech Data does not keep a
copy.
• Prior knowledge is required with Cloudera and Microsoft Azure.
Services not included:
We do not install Apache Sentry, so you can integrate with your own Kerberos
Installation.
We also do not install Apache Kudu.
Cloudera on Microsoft Azure - Step-by-Step Page 4
1.1 Cloudera on Azure deployment.
Connect to StreamOne Cloud Marketplace:
You can search for the Microsoft Azure SKU in Most Viewed, browsing by Categories or Vendor,
or directly searching for it in the search field:
Cloudera on Microsoft Azure - Step-by-Step Page 5
Click on Microsoft Azure:
You will then be able to browse the different skus, click on "ADD TO CART" button of
registration SKU:
Cloudera on Microsoft Azure - Step-by-Step Page 6
Click on "View Cart" button:
Click on 'Proceed to Checkout' button:
Cloudera on Microsoft Azure - Step-by-Step Page 7
Fill End User information or select any end user using your email and click on "Continue to
Configuration" button.
Configuration page should be displayed.
Cloudera on Microsoft Azure - Step-by-Step Page 8
Fill in your Microsoft Partner Network ID.
Click on Create a New end customer Microsoft account button.
Enter any unique domain name and click on Check Availability button.
Select "The End Customer email" radio button from the "Account Administration" module.
Or select "I will administer the account" radio button from the account administration module and
enter the Delegate admin email ID.
Click on "Continue to Payment" button.
Click on "Continue to Summary" button.
Verify the information shown and click on "Place Order" button.
Cloudera on Microsoft Azure - Step-by-Step Page 9
Your order should be complete.
Cloudera on Microsoft Azure - Step-by-Step Page 10
Cloudera on Microsoft Azure - Step-by-Step Page 11
You should receive an email with your Microsoft Subscription Information:
And another email regarding the deployment of your Azure Bundle:
Cloudera on Microsoft Azure - Step-by-Step Page 12
Now click on Reseller Portal, then Customer Admin. Look for your Customer and click on
IaaS/PaaS.
Then Click on Modify:
Cloudera on Microsoft Azure - Step-by-Step Page 13
Then click on “Click to Configure”:
Cloudera on Microsoft Azure - Step-by-Step Page 14
Information related to selected bundle should be displayed.
Select Location from location drop down and fill in the Resource Group Name.
Cloudera on Microsoft Azure - Step-by-Step Page 15
Fill the Basic Information.
You can select the Deployment Size.
Please note that the Admin Username and the Admin Password will be used to access
the Cloudera Master for SSL Proxy.
A SSH key pair will get generated and you need to copy and save the private key.
Once the Cloudera bundle is deployed in the Azure portal, you might use this Key to login into
the underlying VMs.
SSH Private Key:
You will be given your Private SSH Key during the order of you OpenShift Container
Platform bundle. Please make sure you secure this key and store it in a safe place as
you will need it for SSH access to any of your instances. Your key will be displayed
only once and there is no way to recover it later on. For security reasons, Tech Data
does not keep a copy.
Cloudera on Microsoft Azure - Step-by-Step Page 16
In the Advanced Bundle Settings, you can select the VM size.
You can also choose to deploy the Workers on Standard or Premium Storage.
You can give a name to your Cloudera cluster and choose the DNS Name Prefix for your
Cluster.
Click on ‘Deploy Now’ button.
Services not included:
We do not install Apache Sentry, so you can integrate with your own Kerberos
Installation.
We also do not install Apache Kudu.
Cloudera on Microsoft Azure - Step-by-Step Page 17
1.2 How to connect.
Connect to the Azure Portal with your credentials.
• You would need to login to the Azure portal to get the IP address.
▪ https://portal.azure.com
▪ You would need to login using the same user name and password as the
one created in StreamOne and what was emailed to you.
• For example: [email protected]
• It will give you a one-time password and you will need to change it.
Cloudera on Microsoft Azure - Step-by-Step Page 18
You will then be connected to the Azure Portal. Go to Resource groups.
You will find the Resource Group into which your resources have been deployed.
Cloudera on Microsoft Azure - Step-by-Step Page 19
You will then see your resources listed. Find the Virtual Machine that ends with “-mn0”. You will
use it to connect to the Cloudera Dashboard:
You need to retrieve the DNS Name:
You will use it to create an SSL tunnel.
Cloudera on Microsoft Azure - Step-by-Step Page 20
Open a terminal and run the following command:
ssh -L 7180:10.3.0.4:7180 username@publicip
Username being the username you chose during the order process, and public ip being the one
we just retrieved.
Then choose to continue and enter your password to login into the Virtual Machine.
You can then open an internet browser and connect to http://localhost:7180 to connect to the
Cloudera Manager dashboard.
The default admin credentials are:
- Login: admin
- Password: admin
You will have to change them afterwards!
Cloudera on Microsoft Azure - Step-by-Step Page 21
You will then be connected to your Cloudera Manager Dashboard:
And get started!
Services not included:
We do not install Apache Sentry, so you can integrate with your own Kerberos
Installation.
We also do not install Apache Kudu.
Cloudera on Microsoft Azure - Step-by-Step Page 22
1.3 Post-Deployment Tasks.
Now that you have access to your Veeam Backup & Replication VM, you still need
to configure additional items. These tasks will cover the following:
• How to fix up the Warning Configuration Issues
• How to Upgrade your License
• How to Change the Admin Password and Create Users
Services not included:
We do not install Apache Sentry, so you can integrate with your own Kerberos
Installation.
We also do not install Apache Kudu.
1.3.1 Fix the Warning Configuration Issues. You now have access to your Cloudera Manager Dashboard.
There are still some Configurations to finish up, and you are alerted through some
Alerts.
Cloudera on Microsoft Azure - Step-by-Step Page 23
The first one to configure is the Memory Overcommit Validation Threshold (the 5
hosts).
The second one is warning only happening if you are deploying the Small sizing
(only intended for Dev and Test purposes).
Zookeeper needs three servers to be running in High Availability. Since we only
deploy one Master, the system suggests that it should be running at least 3 servers.
Cloudera on Microsoft Azure - Step-by-Step Page 24
The third one is tied to the Cloudera Management Services. You need to configure:
- The Java Heap Size of Service Monitor
- The Maximum Non-Java Memory of Service Monitor
1.3.2 How to Upgrade your License. The system is currently licensed with a 60 days Trial. To upgrade the cluster with you
license, you need to go to “Administration” then “License”.
Cloudera on Microsoft Azure - Step-by-Step Page 25
Click on “Upgrade to Cloudera Enterprise”.
Cloudera on Microsoft Azure - Step-by-Step Page 26
1.3.3 How to Change the Admin Password and Create
Users. To manage users, you need to go to “Administration” then “Users and Roles”.
You can then change the admin password by clicking on “Actions”.
Cloudera on Microsoft Azure - Step-by-Step Page 27
To create a new User, you can click on “Add Local User”.
Cloudera on Microsoft Azure - Step-by-Step Page 28
2. Architecture Notions.
2.1 Small Sizing (Dev and Test).
Edge Node VM Roles:
"FLUME-AGENT-BASE",
"hbase-GATEWAY-BASE",
"hbase-HBASETHRIFTSERVER-BASE",
"ks_indexer-HBASE_INDEXER-BASE",
"hdfs-BALANCER-BASE",
"hdfs-GATEWAY-BASE",
"hdfs-NFSGATEWAY-BASE",
"hdfs-SECONDARYNAMENODE-BASE",
"hive-GATEWAY-BASE",
"hive-HIVEMETASTORE-BASE",
"hue-HUE_LOAD_BALANCER-BASE",
"hue-HUE_SERVER-BASE",
"sqoop_client-GATEWAY-BASE",
"oozie-OOZIE_SERVER-BASE",
"kafka-GATEWAY-BASE",
"kafka-KAFKA_BROKER-BASE",
"solr-SOLR_SERVER-BASE",
"spark_on_yarn-GATEWAY-BASE",
"yarn-GATEWAY-BASE"
Cloudera on Microsoft Azure - Step-by-Step Page 29
VM Master 1 Roles:
"FLUME-AGENT-BASE",
"hbase-MASTER-BASE",
"hdfs-NAMENODE-BASE",
"hive-HIVESERVER2-BASE",
"sqoop_client-GATEWAY-BASE",
"impala-CATALOGSERVER-BASE",
"impala-STATESTORE-BASE",
"kafka-KAFKA_BROKER-BASE",
"hive-GATEWAY-BASE",
"spark_on_yarn-GATEWAY-BASE",
"spark_on_yarn-SPARK_YARN_HISTORY_SERVER-BASE",
"yarn-JOBHISTORY-BASE",
"yarn-RESOURCEMANAGER-BASE",
"zookeeper-SERVER-BASE"
VM Worker Roles (1, 2 and 3):
"FLUME-AGENT-BASE",
"hbase-REGIONSERVER-BASE",
"hdfs-DATANODE-BASE",
"hive-GATEWAY-BASE",
"sqoop_client-GATEWAY-BASE",
"impala-IMPALAD-BASE",
"kafka-KAFKA_BROKER-BASE",
"spark_on_yarn-GATEWAY-BASE",
"yarn-NODEMANAGER-BASE"
Cloudera on Microsoft Azure - Step-by-Step Page 30
2.2 Medium Sizing.
Edge Node VM Roles:
"FLUME-AGENT-BASE",
"hbase-GATEWAY-BASE",
"hbase-HBASERESTSERVER-BASE",
"hbase-HBASETHRIFTSERVER-BASE",
"ks_indexer-HBASE_INDEXER-BASE",
"hdfs-BALANCER-BASE",
"hdfs-GATEWAY-BASE",
"hdfs-NFSGATEWAY-BASE",
"hdfs-HTTPFS-BASE",
"hive-GATEWAY-BASE",
"hive-HIVEMETASTORE-BASE",
"hue-HUE_LOAD_BALANCER-BASE",
"hue-HUE_SERVER-BASE",
"sqoop_client-GATEWAY-BASE",
"oozie-OOZIE_SERVER-BASE",
"kafka-GATEWAY-BASE",
"kafka-KAFKA_BROKER-BASE",
"solr-SOLR_SERVER-BASE",
"spark_on_yarn-GATEWAY-BASE",
"yarn-GATEWAY-BASE"
Cloudera on Microsoft Azure - Step-by-Step Page 31
VM Master 1 Roles:
"FLUME-AGENT-BASE",
"hbase-MASTER-BASE",
"hdfs-NAMENODE-BASE",
"hive-GATEWAY-BASE",
"hive-HIVESERVER2-BASE",
"sqoop_client-GATEWAY-BASE",
"kafka-KAFKA_BROKER-BASE",
"spark_on_yarn-GATEWAY-BASE",
"spark_on_yarn-SPARK_YARN_HISTORY_SERVER-BASE",
"zookeeper-SERVER-BASE"
VM Master 2 Roles:
"FLUME-AGENT-BASE",
"hbase-MASTER-BASE",
"hdfs-SECONDARYNAMENODE-BASE",
"hive-GATEWAY-BASE",
"hive-HIVESERVER2-BASE",
"sqoop_client-GATEWAY-BASE",
"kafka-KAFKA_BROKER-BASE",
"spark_on_yarn-GATEWAY-BASE",
"yarn-JOBHISTORY-BASE",
"yarn-RESOURCEMANAGER-BASE",
"zookeeper-SERVER-BASE"
VM Master 3 Roles:
"FLUME-AGENT-BASE",
"hbase-MASTER-BASE",
"hive-GATEWAY-BASE",
"hive-HIVESERVER2-BASE",
"sqoop_client-GATEWAY-BASE",
"impala-CATALOGSERVER-BASE",
"impala-STATESTORE-BASE",
"kafka-KAFKA_BROKER-BASE",
"spark_on_yarn-GATEWAY-BASE",
"zookeeper-SERVER-BASE"
VM Worker Roles (1, 2, 3, 4, 5 and 6):
"FLUME-AGENT-BASE",
"hbase-REGIONSERVER-BASE",
"hdfs-DATANODE-BASE",
"hive-GATEWAY-BASE",
"sqoop_client-GATEWAY-BASE",
"impala-IMPALAD-BASE",
"kafka-KAFKA_BROKER-BASE",
"spark_on_yarn-GATEWAY-BASE",
"yarn-NODEMANAGER-BASE"
Cloudera on Microsoft Azure - Step-by-Step Page 32
2.3 Large Sizing.
Edge Node VM Roles:
"FLUME-AGENT-BASE",
"hbase-GATEWAY-BASE",
"hbase-HBASERESTSERVER-BASE",
"hbase-HBASETHRIFTSERVER-BASE",
"ks_indexer-HBASE_INDEXER-BASE",
"hdfs-BALANCER-BASE",
"hdfs-GATEWAY-BASE",
"hdfs-NFSGATEWAY-BASE",
"hdfs-HTTPFS-BASE",
"hive-GATEWAY-BASE",
"hive-HIVEMETASTORE-BASE",
"hue-HUE_LOAD_BALANCER-BASE",
"hue-HUE_SERVER-BASE",
"sqoop_client-GATEWAY-BASE",
"oozie-OOZIE_SERVER-BASE",
"kafka-GATEWAY-BASE",
"kafka-KAFKA_BROKER-BASE",
"solr-SOLR_SERVER-BASE",
"spark_on_yarn-GATEWAY-BASE",
"yarn-GATEWAY-BASE"
Cloudera on Microsoft Azure - Step-by-Step Page 33
VM Master 1 Roles:
"FLUME-AGENT-BASE",
"hbase-MASTER-BASE",
"hdfs-NAMENODE-BASE",
"hive-GATEWAY-BASE",
"hive-HIVESERVER2-BASE",
"sqoop_client-GATEWAY-BASE",
"kafka-KAFKA_BROKER-BASE",
"spark_on_yarn-GATEWAY-BASE",
"spark_on_yarn-SPARK_YARN_HISTORY_SERVER-BASE",
"zookeeper-SERVER-BASE"
VM Master 2 Roles:
"FLUME-AGENT-BASE",
"hbase-MASTER-BASE",
"hdfs-SECONDARYNAMENODE-BASE",
"hive-GATEWAY-BASE",
"hive-HIVESERVER2-BASE",
"sqoop_client-GATEWAY-BASE",
"kafka-KAFKA_BROKER-BASE",
"spark_on_yarn-GATEWAY-BASE",
"yarn-JOBHISTORY-BASE",
"yarn-RESOURCEMANAGER-BASE",
"zookeeper-SERVER-BASE"
VM Master 3 Roles:
"FLUME-AGENT-BASE",
"hbase-MASTER-BASE",
"hive-GATEWAY-BASE",
"hive-HIVESERVER2-BASE",
"sqoop_client-GATEWAY-BASE",
"impala-CATALOGSERVER-BASE",
"impala-STATESTORE-BASE",
"kafka-KAFKA_BROKER-BASE",
"spark_on_yarn-GATEWAY-BASE",
"zookeeper-SERVER-BASE"
VM Worker Roles (1, 2, 3, up to 16):
"FLUME-AGENT-BASE",
"hbase-REGIONSERVER-BASE",
"hdfs-DATANODE-BASE",
"hive-GATEWAY-BASE",
"sqoop_client-GATEWAY-BASE",
"impala-IMPALAD-BASE",
"kafka-KAFKA_BROKER-BASE",
"spark_on_yarn-GATEWAY-BASE",
"yarn-NODEMANAGER-BASE"