aws partner webcast - fast insights from disparate data with amazon redshift and clearstory data
DESCRIPTION
Organizations relying on data intelligence depend on fast-cycle analysis. In this webinar, you will learn how to use ClearStory and Amazon Redshift to quickly bring together more sources of internal and external data, and discover new insights. ClearStory is an integrated application and platform, speeding analysis across internal & external data. You can access internal and external data, speed data manipulation by up to 80%, and see insights through visual data updates. What you’ll learn: - How Amazon Redshift & ClearStory Data bring a new approach to speed in-depth analysis across diverse data sets - How ClearStory’s simple, collaborative user model and visualization features ease discovering insights - Customer case studies for the joint solution across retail, consumer packaged goods and online businessesTRANSCRIPT
Get Insights from Disparate Data Quickly with Amazon Redshift and ClearStory Data
Webinar Overview
Submit Your Questions using the Q&A tool.
A copy of today’s presentation will be made available on:
AWS SlideShare Channel@ http://www.slideshare.net/AmazonWebServices/
AWS Webinar Channel on YouTube@ http://www.youtube.com/channel/UCT-
nPlVzJI-ccQXlxjSvJmw
Introducing
Tina Adams Product Manager
Amazon Web Services
Vaibhav Nivargi Technical Co-founder
ClearStory Data
Overview of Amazon Redshift data warehouse
How ClearStory’s simple and collaborative user model eases
discovering insights in Amazon Redshift
Demo
Q&A
What We’ll Cover
Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year
Amazon Redshift
Amazon Redshift Architecture
• Leader Node
– SQL endpoint
– Stores metadata
– Coordinates query execution
• Compute Nodes
– Local, columnar storage
– Execute queries in parallel
– Load, backup, restore via
Amazon S3; load from
Amazon DynamoDB or SSH
• Two hardware platforms
– Optimized for data processing
– DW1: HDD; scale from 2TB to 1.6PB
– DW2: SSD; scale from 160GB to 256TB
10 GigE
(HPC)
Ingestion Backup Restore
JDBC/ODBC
Amazon Redshift is priced to let you analyze all your data
• Number of nodes x cost per hr
• No charge for leader node
• No upfront costs
• Pay as you go
DW1 (HDD) Price Per Hour for
DW1.XL Single Node
Effective Annual
Price per TB
On-Demand $ 0.850 $ 3,723
1 Year Reservation $ 0.500 $ 2,190
3 Year Reservation $ 0.228 $ 999
DW2 (SSD) Price Per Hour for
DW2.L Single Node
Effective Annual
Price per TB
On-Demand $ 0.250 $ 13,688
1 Year Reservation $ 0.161 $ 8,794
3 Year Reservation $ 0.100 $ 5,498
Amazon Redshift Feature Delivery
Service Launch (2/14)
PDX (4/2)
Temp Credentials (4/11)
Unload Encrypted Files
DUB (4/25)
NRT (6/5)
JDBC Fetch Size (6/27)
Unload logs (7/5)
4 byte UTF-8 (7/18)
Statement Timeout (7/22)
SHA1 Builtin (7/15)
Timezone, Epoch, Autoformat (7/25)
WLM Timeout/Wildcards (8/1)
CRC32 Builtin, CSV, Restore Progress (8/9)
UTF-8 Substitution (8/29)
JSON, Regex, Cursors (9/10)
Split_part, Audit tables (10/3)
SIN/SYD (10/8)
HSM Support (11/11)
Kinesis EMR/HDFS/SSH copy, Distributed Tables, Audit
Logging/CloudTrail, Concurrency, Resize Perf., Approximate Count Distinct, SNS
Alerts (11/13)
SOC1/2/3 (5/8)
Sharing snapshots (7/18)
Resource Level IAM (8/9)
PCI (8/22) Distributed Tables, Single Node Cursor Support, Maximum Connections to 500
(12/13)
EIP Support for VPC Clusters (12/28)
New query monitoring system tables and diststyle all (1/13)
Redshift on DW2 (SSD) Nodes (1/23)
Compression for COPY from SSH, Fetch size support for single node clusters, new system tables with commit stats,
row_number(), strotol() and query termination (2/13)
Resize progress indicator & Cluster Version (3/21)
Regex_Substr, COPY from JSON (3/25)
Amazon Redshift integrates with multiple data sources
Amazon S3
Amazon EMR
Amazon Redshift
DynamoDB
Amazon RDS
Corporate Datacenter
ClearStory Data Solution June 3rd.. 2014
ClearStory Solution
Faster, Interactive Insights for LOB Users & Data Analysts
Fast, blended insights combining
private data sources and external sources with a
new interactive and collaborative user model
Distributed
Teams
Business Users
Data Analysts
Data Stewards
Why Companies Choose ClearStory
1. Fast-cycle analysis accessing & converging data
from multiple sources.
How? Data inference eliminates traditional
data modeling steps + automated Data Harmonization/blending
2. On-going diagnostic and exploratory analysis How? Uncover insights faster as data updates at source
3. Interactive and collaborative user model
to speed observations & decisions.
How? A new user model with collaboration, for any skillset
Data Sources
- RedShift, RDBMS, EDW
- Big Data platforms
- External Premium/Public data
ClearStory Integrated Application & Processing Platform
Data Inference & Profiling
Harmonization
Visualization
Collaboration
Data
Convergence
ClearStory Platform ClearStory Application
Healthcare Provider
• Determine patterns on doctor diagnosis and clinical trials
• Reconcile and harmonize all data streams by zip code
Major Retailer – Close Rate Analysis by Location
• Identify factors affecting product sales
- by location/zipcode and time of day
Customer Examples
CPG – Daily analysis on dairy product sales by location • Diverse data sources converged in Redshift, updating daily
• ClearStory for harmonized holistic, intra-day analysis
“Answers that used to take 60 to 90 days to pull together can now be seen in a day.
The speed of execution is dramatically different than what it was before.”
CPG Leader in Dairy
What You Will See Today
1. Speed to Interactive Insights
Accessing Redshift + automated data inference +
data harmonization/blending
2. Fast Interactive, Collaborative insights
3. Analysis self-updates as data updates (Data Stories updating intra-day, daily, weekly…)
For Data Stewards and LOB Users
ClearStory & Redshift Demo
Request a Trial
www.clearstorydata.com
Contacts and Q&A
Contacts: ClearStory Data: http://www.clearstorydata.com/ email: [email protected] AWS Contact: aws.amazon.com/contact-us [email protected]