building a hadoop cluster on amazon ec2 using...

18
Building a Hadoop Cluster on Amazon EC2 using Cloudera April 2013 h"p://randyzwitch.com/bigdatahadoopamazonec2clouderapart2

Upload: dangdang

Post on 05-Jun-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building a Hadoop Cluster on Amazon EC2 using Clouderarandyzwitch.com/wp-content/uploads/2013/04/cloudera-amazon-ec2.pdf · Building a Hadoop Cluster on Amazon EC2 using Cloudera

Building a Hadoop Cluster on Amazon EC2 using Cloudera

April 2013

h"p://randyzwitch.com/big-­‐data-­‐hadoop-­‐amazon-­‐ec2-­‐cloudera-­‐part-­‐2  

Page 2: Building a Hadoop Cluster on Amazon EC2 using Clouderarandyzwitch.com/wp-content/uploads/2013/04/cloudera-amazon-ec2.pdf · Building a Hadoop Cluster on Amazon EC2 using Cloudera

Create Name Node: m1.large Ubuntu Server 12.04.1 LTS 64-bit

http://randyzwitch.com/big-data-hadoop-amazon-ec2-cloudera-part-2

Page 3: Building a Hadoop Cluster on Amazon EC2 using Clouderarandyzwitch.com/wp-content/uploads/2013/04/cloudera-amazon-ec2.pdf · Building a Hadoop Cluster on Amazon EC2 using Cloudera

Create Name Node: m1.large Can use defaults for most Wizard screens except Firewall

http://randyzwitch.com/big-data-hadoop-amazon-ec2-cloudera-part-2

Page 4: Building a Hadoop Cluster on Amazon EC2 using Clouderarandyzwitch.com/wp-content/uploads/2013/04/cloudera-amazon-ec2.pdf · Building a Hadoop Cluster on Amazon EC2 using Cloudera

Launch Instance, Connect Via SSH

http://randyzwitch.com/big-data-hadoop-amazon-ec2-cloudera-part-2

Page 5: Building a Hadoop Cluster on Amazon EC2 using Clouderarandyzwitch.com/wp-content/uploads/2013/04/cloudera-amazon-ec2.pdf · Building a Hadoop Cluster on Amazon EC2 using Cloudera

Download & Run Cloudera Manager Depending on settings, might need to run as “sudo”

Might take 5 or so minutes to go through licensing and installation menus

http://randyzwitch.com/big-data-hadoop-amazon-ec2-cloudera-part-2

Page 6: Building a Hadoop Cluster on Amazon EC2 using Clouderarandyzwitch.com/wp-content/uploads/2013/04/cloudera-amazon-ec2.pdf · Building a Hadoop Cluster on Amazon EC2 using Cloudera

Open browser And Go To EC2 Instance Public DNS At Port 7180

Ex: http://ec2-50-17-162-58.compute-1.amazonaws.com:7180

On this first login, you set your username and password credentials

http://randyzwitch.com/big-data-hadoop-amazon-ec2-cloudera-part-2

Page 7: Building a Hadoop Cluster on Amazon EC2 using Clouderarandyzwitch.com/wp-content/uploads/2013/04/cloudera-amazon-ec2.pdf · Building a Hadoop Cluster on Amazon EC2 using Cloudera

Use Defaults,‘18’ for instances

http://randyzwitch.com/big-data-hadoop-amazon-ec2-cloudera-part-2

Page 8: Building a Hadoop Cluster on Amazon EC2 using Clouderarandyzwitch.com/wp-content/uploads/2013/04/cloudera-amazon-ec2.pdf · Building a Hadoop Cluster on Amazon EC2 using Cloudera

Type In AWS Access Key ID & Secret Access Key Credentials can be found under “Security Credentials” in EC2 dashboard

http://randyzwitch.com/big-data-hadoop-amazon-ec2-cloudera-part-2

Page 9: Building a Hadoop Cluster on Amazon EC2 using Clouderarandyzwitch.com/wp-content/uploads/2013/04/cloudera-amazon-ec2.pdf · Building a Hadoop Cluster on Amazon EC2 using Cloudera

Review Settings Then Install!

Provisioning Instances will take a few minutes

http://randyzwitch.com/big-data-hadoop-amazon-ec2-cloudera-part-2

Page 10: Building a Hadoop Cluster on Amazon EC2 using Clouderarandyzwitch.com/wp-content/uploads/2013/04/cloudera-amazon-ec2.pdf · Building a Hadoop Cluster on Amazon EC2 using Cloudera

If Any Installations Fail, Retry Until Success

http://randyzwitch.com/big-data-hadoop-amazon-ec2-cloudera-part-2

Page 11: Building a Hadoop Cluster on Amazon EC2 using Clouderarandyzwitch.com/wp-content/uploads/2013/04/cloudera-amazon-ec2.pdf · Building a Hadoop Cluster on Amazon EC2 using Cloudera

Make Sure Consistency Check Passes

http://randyzwitch.com/big-data-hadoop-amazon-ec2-cloudera-part-2

Page 12: Building a Hadoop Cluster on Amazon EC2 using Clouderarandyzwitch.com/wp-content/uploads/2013/04/cloudera-amazon-ec2.pdf · Building a Hadoop Cluster on Amazon EC2 using Cloudera

Cluster Services Will Start, Then Success!

http://randyzwitch.com/big-data-hadoop-amazon-ec2-cloudera-part-2

Page 13: Building a Hadoop Cluster on Amazon EC2 using Clouderarandyzwitch.com/wp-content/uploads/2013/04/cloudera-amazon-ec2.pdf · Building a Hadoop Cluster on Amazon EC2 using Cloudera

Finding Hue Public DNS

http://randyzwitch.com/big-data-hadoop-amazon-ec2-cloudera-part-2

Hue (Hadoop User Experience) is the more user-friendly way to interact with Hadoop

Page 14: Building a Hadoop Cluster on Amazon EC2 using Clouderarandyzwitch.com/wp-content/uploads/2013/04/cloudera-amazon-ec2.pdf · Building a Hadoop Cluster on Amazon EC2 using Cloudera

Finding Hue Public DNS

http://randyzwitch.com/big-data-hadoop-amazon-ec2-cloudera-part-2

Clicking on the “Hue Web UI” button doesn’t work, because it references the Internal Address for Amazon EC2

Clicking this link button won’t work!

Need to find the Public DNS for this Internal Address in Amazon Dashboard

Page 15: Building a Hadoop Cluster on Amazon EC2 using Clouderarandyzwitch.com/wp-content/uploads/2013/04/cloudera-amazon-ec2.pdf · Building a Hadoop Cluster on Amazon EC2 using Cloudera

Finding Hue Public DNS

http://randyzwitch.com/big-data-hadoop-amazon-ec2-cloudera-part-2

Type in Internal Address in Search Box to find the Instance having Hue

This is the public DNS Address to access Hue

Page 16: Building a Hadoop Cluster on Amazon EC2 using Clouderarandyzwitch.com/wp-content/uploads/2013/04/cloudera-amazon-ec2.pdf · Building a Hadoop Cluster on Amazon EC2 using Cloudera

Finding Hue Public DNS

http://randyzwitch.com/big-data-hadoop-amazon-ec2-cloudera-part-2

Hue is accessed via Port 8888 Ex: http://ec2-54-224-118-78.compute-1.amazonaws.com:8888 Pick your username/password carefully, this is the superuser

Page 17: Building a Hadoop Cluster on Amazon EC2 using Clouderarandyzwitch.com/wp-content/uploads/2013/04/cloudera-amazon-ec2.pdf · Building a Hadoop Cluster on Amazon EC2 using Cloudera

If You See Hue, You’re Ready For Analysis!

http://randyzwitch.com/big-data-hadoop-amazon-ec2-cloudera-part-2

This is the Hive editor, which allows for SQL-Like Syntax to create MapReduce jobs

Page 18: Building a Hadoop Cluster on Amazon EC2 using Clouderarandyzwitch.com/wp-content/uploads/2013/04/cloudera-amazon-ec2.pdf · Building a Hadoop Cluster on Amazon EC2 using Cloudera

Reference http://blog.cloudera.com/blog/2013/03/how-to-create-a-cdh-cluster-on-amazon-ec2-via-cloudera-manager/