amazon redshift is a fully managed, petabyte-scale data warehouse service … · 2019-06-27 ·...

11
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. An Amazon Redshift data warehouse is a collection of computing resources called nodes, which are organized into a group called a cluster. Each cluster runs an Amazon Redshift engine and contains one or more databases. Depending on your data warehousing needs, you can start with a small, single-node cluster and easily scale up to a larger, multi-node cluster as your requirements change. You can add or remove compute nodes to the cluster without any interruption to the service. Snapshots are point-in-time backups of a cluster which stores in S3. There are two types of snapshots: automated and manual.When required to restore from a snapshot, Amazon Redshift creates a new cluster and imports data from the specified snapshot. Amazon Redshift creates one database when provision a cluster.Can create additional databases as needed by running a SQL command. While specify master has access to all DB that are created within the cluster.This master user is a superuser who is the only user with access to the database initially, though this user can create additional superusers and users. Create an IAM Role Sign in to the AWS Management Console and open the IAM console at https://console.aws.amazon.com/iam/. In the navigation pane, choose Roles. Choose Create role.

Upload: others

Post on 27-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Amazon Redshift is a fully managed, petabyte-scale data warehouse service … · 2019-06-27 · Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. An Amazon Redshift data warehouse is a collection of computing resources called nodes, which are organized into a group called a cluster. Each cluster runs an Amazon Redshift engine and contains one or more databases. Depending on your data warehousing needs, you can start with a small, single-node cluster and easily scale up to a larger, multi-node cluster as your requirements change. You can add or remove compute nodes to the cluster without any interruption to the service. Snapshots are point-in-time backups of a cluster which stores in S3. There are two types of snapshots: automated and manual.When required to restore from a snapshot, Amazon Redshift creates a new cluster and imports data from the specified snapshot. Amazon Redshift creates one database when provision a cluster.Can create additional databases as needed by running a SQL command. While specify master has access to all DB that are created within the cluster.This master user is a superuser who is the only user with access to the database initially, though this user can create additional superusers and users.

Create an IAM Role

Sign in to the AWS Management Console and open the IAM console at

https://console.aws.amazon.com/iam/.

In the navigation pane, choose Roles.

Choose Create role.

Page 2: Amazon Redshift is a fully managed, petabyte-scale data warehouse service … · 2019-06-27 · Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud

In the AWS Service group, choose Redshift.

Under Select your use case, choose Redshift - Customizable then choose Next: Permissions.

On the Attach permissions policies page, choose AmazonS3ReadOnlyAccess. You can leave the

default setting for Set permissions boundary. Then choose Next: Tags.

Page 3: Amazon Redshift is a fully managed, petabyte-scale data warehouse service … · 2019-06-27 · Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud

The Add tags page appears. You can optionally add tags. Choose Next: Review.

For Role name, type a name for your role. For this tutorial, type myRedshiftRole.

Review the information, and then choose Create Role.

Choose the role name of the role you just created.

Copy the Role ARN to your clipboard—this value is the Amazon Resource Name (ARN) for the role

that you just created. You use that value when you use the COPY command to load data

Page 4: Amazon Redshift is a fully managed, petabyte-scale data warehouse service … · 2019-06-27 · Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud

Launch an Amazon Redshift Cluster

Sign in to the AWS Management Console and open the Amazon Redshift console at

https://console.aws.amazon.com/redshift/

On the Amazon Redshift Dashboard, choose Quick launch cluster.

On the Cluster specifications page, enter the following values and then choose Launch cluster:

Node type: Choose dc2.large.

Number of compute nodes: Keep the default value of 2.

Cluster identifier: Enter the value examplecluster.

Master user name: Keep the default value of awsuser.

Master user password and Confirm password: Enter a password for the master user account.

Page 5: Amazon Redshift is a fully managed, petabyte-scale data warehouse service … · 2019-06-27 · Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud

Database port: Accept the default value of 5439.

Available IAM roles: Choose myRedshiftRole.

Quick Launch automatically creates a default database named dev.

Page 6: Amazon Redshift is a fully managed, petabyte-scale data warehouse service … · 2019-06-27 · Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud

A confirmation page appears and the cluster takes a few minutes to finish. Choose Close to return to

the list of clusters.

On the Clusters page, choose the cluster that you just launched and review the Cluster Status

information. Make sure that the Cluster Status is available and the Database Health is healthy

Page 7: Amazon Redshift is a fully managed, petabyte-scale data warehouse service … · 2019-06-27 · Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud

Providing Access to the Cluster

In the Amazon Redshift console, in the navigation pane, choose Clusters. Choose examplecluster to open it, and make sure that you are on the Configuration tab. Under Cluster Properties, for VPC Security Groups, choose your security group. After your security group opens in the Amazon EC2 console, choose the Inbound tab. Choose Edit, Add Rule, and enter the following, then choose Save: Type: Custom TCP Rule. Protocol: TCP. Port Range: type the same port number that you used when you launched the cluster. The default port for Amazon Redshift is 5439, but your port might be different. Source: select Custom, then type 0.0.0.0/0. Important Using 0.0.0.0/0 is not recommended for anything other than demonstration purposes because it allows access from any computer on the internet. In a real environment, you would create inbound rules based on your own network settings.

Page 8: Amazon Redshift is a fully managed, petabyte-scale data warehouse service … · 2019-06-27 · Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud

To attach the required IAM policies for the Query Editor

Sign in to the AWS Management Console and open the IAM console at

https://console.aws.amazon.com/iam/.

Choose Users.

Choose the user that needs access to the Query Editor.

Choose Add permissions.

Choose Attach existing policies directly.

Page 9: Amazon Redshift is a fully managed, petabyte-scale data warehouse service … · 2019-06-27 · Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud

For Policy names, choose AmazonRedshiftQueryEditor and AmazonRedshiftReadOnlyAccess.

Choose Next: Review.

Choose Add permissions.

To use the Query Editor

Sign in to the AWS Management Console and open the Amazon Redshift console at

https://console.aws.amazon.com/redshift/.

In the navigation pane, choose Query Editor.

In the Credentials dialog box, enter the following values and then choose Connect:

Cluster: Choose examplecluster.

Database: dev.

Database user: awsuser

Password: Enter the password that you specified when you launched the cluster.

For Schema, choose public to create a new table based on that schema.

Page 10: Amazon Redshift is a fully managed, petabyte-scale data warehouse service … · 2019-06-27 · Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud
Page 11: Amazon Redshift is a fully managed, petabyte-scale data warehouse service … · 2019-06-27 · Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud

COPY command to load large datasets into Amazon Redshift from Amazon S3 or DynamoDB.

eg.

copy users from 's3://awssampledbuswest2/tickit/allusers_pipe.txt'

credentials 'aws_iam_role=arn:aws:iam::876085007297:role/myRedshiftRole'

delimiter '|' region 'us-west-2';

There a cost calculator provided to check the savings before you move to AWS.

https://awstcocalculator.com/