redshift introduction
TRANSCRIPT
Amazon RedshiftSaturday, December 6, 2014
Agenda
08:30 AM Breakfast 09:00 AM Introduction and Strengths of Technologies 10:00 AM break + set up query tool 10:20 AM Hadoop hands-on 10:55 AM break 11:10 AM Redshift hands-on 11:40 AM Operationalizing your code 12:00 PM adjourn
04/14/2023 2
Session Goals
• Understand:• Why an Analytic Database?• What is Amazon Redshift
• Do:• ‘Fire Up’ an Redshift Database• Load Data• Do a few queries• Shut it down
• Have fun!
04/14/2023 3
Why an Analytic Database?
Why use one?• It a database optimized for read-only queries.• It’s fast• It can handle a lot of data
Why not to use one?• Poor Transaction processing (aka OLTP)• Rollback, multi-phase commits, etc
04/14/2023 4
Under the hood.
Analytic Database typically have features like:• Compression• Column (as opposed to row) storage• Parallel queries across clusters of machines• Support for partitioning• Other cool stuff to make your queries fast
04/14/2023 5
Columns vs Row Storage
04/14/2023 6
Parallel Queries
04/14/2023 7
Compression
04/14/2023 8
Amazon Redshift is an Example of an Analytic Database
04/14/2023 9
Amazon Redshift uses typical SQL to query the database
04/14/2023 10
Let’s Get Stared!
04/14/2023 11
The basics:• You will need an AWS account• AWS Secret Key• AWS Access Key
• Install SQL Workbench• http://www.sql-workbench.net/manual/install.html
• Install Postres JDBC Drivers:• http://jdbc.postgresql.org/
Let’s Get Stared!: https://aws.amazon.com/
04/14/2023 12
Click Here
Redshift: https://console.aws.amazon.com/redshift/.
04/14/2023 13
Click Here
Launch: http://docs.aws.amazon.com/redshift/latest/gsg/rs-gsg-launch-sample-cluster.html
04/14/2023 14
Fill these out
Single Node: https://console.aws.amazon.com/redshift/home?region=us-east-1#launch-cluster:
04/14/2023 15
Single Node
Security: https://console.aws.amazon.com/redshift/home?region=us-east-1#launch-cluster:
04/14/2023 16
East, not in VPC, default, no alarms (below)
Review: https://console.aws.amazon.com/redshift/home?region=us-east-1#launch-cluster:
04/14/2023 17
Review
Launch!:
04/14/2023 18
Click
Launch!:
04/14/2023 19
Click
Wait:
04/14/2023 20
Wait, then click
When Active:
04/14/2023 21
You’ll need these details
Connect with SQL Workbench:
04/14/2023 22
Select Connect Window
Connect with SQL Workbench:
04/14/2023 23
Fill this out
Get the JDBC URL
04/14/2023 24
Copy this
Connect with SQL Workbench:
04/14/2023 25
Paste and Fill this out
Success!:
04/14/2023 26
New SQL Tab
04/14/2023 27
Add Tab
New SQL Tab
04/14/2023 28
Add Tab
Make Tables
04/14/2023 29
Create Some Tables
CREATE TABLE rankings ( pageURL VARCHAR(300), pageRank INT, avgDuration INT);
CREATE TABLE uservisits ( sourceIP VARCHAR(116), destinationURL VARCHAR(100), visitDate DATE, adRevenue FLOAT, UserAgent VARCHAR(256), cCode CHAR(3), lCode CHAR(6), searchWord VARCHAR(32), duration INT);
Load Data
04/14/2023 30
Load Data from S3
copy uservisits FROM 's3://big-data-benchmark/pavlo/text/tiny/uservisits/' CREDENTIALS 'aws_access_key_id=<your key>;aws_secret_access_key=<your key>' delimiter ',';
copy rankings FROM 's3://big-data-benchmark/pavlo/text/tiny/rankings/' CREDENTIALS 'aws_access_key_id =<your key>;aws_secret_access_key =<your key>' delimiter ',';
Load Bigger Data
04/14/2023 31
Load Data from S3
's3://big-data-benchmark/pavlo/text/tiny/uservisits/‘
-- options: "tiny", "1node", "5nodes", "10nodes"
Simple Queries
04/14/2023 32
Query
select * from uservisits limit 100;SELECT COUNT(*) from uservisits;
select * from rankings limit 100;SELECT COUNT(*) from rankings;
Complex Queries
04/14/2023 33
Query
SELECT pageURL, pageRank FROM rankings WHERE pageRank > 10;
SELECT sourceIP, SPLIT_PART(sourceIP, '.', 1) as fn, SPLIT_PART(sourceIP, '.', 2) as sn FROM uservisits LIMIT 100;
SELECT sourceIP, SUM(adRevenue) AS totalRevenue, AVG(pageRank) AS pageRankFROM rankings R JOIN (SELECT sourceIP, destinationURL, adRevenue FROM uservisits uv) NUV ON (R.pageURL = NUV.destinationURL)GROUP BY sourceIPORDER BY totalRevenue DESC LIMIT 100;
Shut it down!
04/14/2023 34
Click
Shut it down!
04/14/2023 35
Click
Shut it down!
04/14/2023 36
No snapshot
Shut it down!
04/14/2023 37
Thanks … happy querying!
See also• http://docs.aws.amazon.com/redshift/latest/gsg/getting-started.html
04/14/2023 38