redshift introduction

38
Amazon Redshift Saturday, December 6, 2014

Upload: datakitchen

Post on 25-Jul-2015

212 views

Category:

Software


3 download

TRANSCRIPT

Page 1: Redshift Introduction

Amazon RedshiftSaturday, December 6, 2014

Page 2: Redshift Introduction

Agenda

08:30 AM Breakfast 09:00 AM Introduction and Strengths of Technologies 10:00 AM break + set up query tool 10:20 AM Hadoop hands-on 10:55 AM break 11:10 AM Redshift hands-on 11:40 AM Operationalizing your code 12:00 PM adjourn

04/14/2023 2

Page 3: Redshift Introduction

Session Goals

• Understand:• Why an Analytic Database?• What is Amazon Redshift

• Do:• ‘Fire Up’ an Redshift Database• Load Data• Do a few queries• Shut it down

• Have fun!

04/14/2023 3

Page 4: Redshift Introduction

Why an Analytic Database?

Why use one?• It a database optimized for read-only queries.• It’s fast• It can handle a lot of data

Why not to use one?• Poor Transaction processing (aka OLTP)• Rollback, multi-phase commits, etc

04/14/2023 4

Page 5: Redshift Introduction

Under the hood.

Analytic Database typically have features like:• Compression• Column (as opposed to row) storage• Parallel queries across clusters of machines• Support for partitioning• Other cool stuff to make your queries fast

04/14/2023 5

Page 6: Redshift Introduction

Columns vs Row Storage

04/14/2023 6

Page 7: Redshift Introduction

Parallel Queries

04/14/2023 7

Page 8: Redshift Introduction

Compression

04/14/2023 8

Page 9: Redshift Introduction

Amazon Redshift is an Example of an Analytic Database

04/14/2023 9

Page 10: Redshift Introduction

Amazon Redshift uses typical SQL to query the database

04/14/2023 10

Page 11: Redshift Introduction

Let’s Get Stared!

04/14/2023 11

The basics:• You will need an AWS account• AWS Secret Key• AWS Access Key

• Install SQL Workbench• http://www.sql-workbench.net/manual/install.html

• Install Postres JDBC Drivers:• http://jdbc.postgresql.org/

Page 12: Redshift Introduction

Let’s Get Stared!: https://aws.amazon.com/

04/14/2023 12

Click Here

Page 13: Redshift Introduction

Redshift: https://console.aws.amazon.com/redshift/.

04/14/2023 13

Click Here

Page 14: Redshift Introduction

Launch: http://docs.aws.amazon.com/redshift/latest/gsg/rs-gsg-launch-sample-cluster.html

04/14/2023 14

Fill these out

Page 15: Redshift Introduction

Single Node: https://console.aws.amazon.com/redshift/home?region=us-east-1#launch-cluster:

04/14/2023 15

Single Node

Page 16: Redshift Introduction

Security: https://console.aws.amazon.com/redshift/home?region=us-east-1#launch-cluster:

04/14/2023 16

East, not in VPC, default, no alarms (below)

Page 17: Redshift Introduction

Review: https://console.aws.amazon.com/redshift/home?region=us-east-1#launch-cluster:

04/14/2023 17

Review

Page 18: Redshift Introduction

Launch!:

04/14/2023 18

Click

Page 19: Redshift Introduction

Launch!:

04/14/2023 19

Click

Page 20: Redshift Introduction

Wait:

04/14/2023 20

Wait, then click

Page 21: Redshift Introduction

When Active:

04/14/2023 21

You’ll need these details

Page 22: Redshift Introduction

Connect with SQL Workbench:

04/14/2023 22

Select Connect Window

Page 23: Redshift Introduction

Connect with SQL Workbench:

04/14/2023 23

Fill this out

Page 24: Redshift Introduction

Get the JDBC URL

04/14/2023 24

Copy this

Page 25: Redshift Introduction

Connect with SQL Workbench:

04/14/2023 25

Paste and Fill this out

Page 26: Redshift Introduction

Success!:

04/14/2023 26

Page 27: Redshift Introduction

New SQL Tab

04/14/2023 27

Add Tab

Page 28: Redshift Introduction

New SQL Tab

04/14/2023 28

Add Tab

Page 29: Redshift Introduction

Make Tables

04/14/2023 29

Create Some Tables

CREATE TABLE rankings ( pageURL VARCHAR(300), pageRank INT, avgDuration INT);

CREATE TABLE uservisits ( sourceIP VARCHAR(116), destinationURL VARCHAR(100), visitDate DATE, adRevenue FLOAT, UserAgent VARCHAR(256), cCode CHAR(3), lCode CHAR(6), searchWord VARCHAR(32), duration INT);

Page 30: Redshift Introduction

Load Data

04/14/2023 30

Load Data from S3

copy uservisits FROM 's3://big-data-benchmark/pavlo/text/tiny/uservisits/' CREDENTIALS 'aws_access_key_id=<your key>;aws_secret_access_key=<your key>' delimiter ',';

copy rankings FROM 's3://big-data-benchmark/pavlo/text/tiny/rankings/' CREDENTIALS 'aws_access_key_id =<your key>;aws_secret_access_key =<your key>' delimiter ',';

Page 31: Redshift Introduction

Load Bigger Data

04/14/2023 31

Load Data from S3

's3://big-data-benchmark/pavlo/text/tiny/uservisits/‘

-- options: "tiny", "1node", "5nodes", "10nodes"

Page 32: Redshift Introduction

Simple Queries

04/14/2023 32

Query

select * from uservisits limit 100;SELECT COUNT(*) from uservisits;

select * from rankings limit 100;SELECT COUNT(*) from rankings;

Page 33: Redshift Introduction

Complex Queries

04/14/2023 33

Query

SELECT pageURL, pageRank FROM rankings WHERE pageRank > 10;

SELECT sourceIP, SPLIT_PART(sourceIP, '.', 1) as fn, SPLIT_PART(sourceIP, '.', 2) as sn FROM uservisits LIMIT 100;

SELECT sourceIP, SUM(adRevenue) AS totalRevenue, AVG(pageRank) AS pageRankFROM rankings R JOIN (SELECT sourceIP, destinationURL, adRevenue FROM uservisits uv) NUV ON (R.pageURL = NUV.destinationURL)GROUP BY sourceIPORDER BY totalRevenue DESC LIMIT 100;

Page 34: Redshift Introduction

Shut it down!

04/14/2023 34

Click

Page 35: Redshift Introduction

Shut it down!

04/14/2023 35

Click

Page 36: Redshift Introduction

Shut it down!

04/14/2023 36

No snapshot

Page 37: Redshift Introduction

Shut it down!

04/14/2023 37

Page 38: Redshift Introduction

Thanks … happy querying!

See also• http://docs.aws.amazon.com/redshift/latest/gsg/getting-started.html

04/14/2023 38