big data analytics preview
DESCRIPTION
A preview of my Big Data Analytics course on Pluralsight.com - Presented at the San Diego Tableau User Group Nov 7th, 2013TRANSCRIPT
Course Outline
Introduction to Big DataMassively Parallel Processing (MPP) DatabasesCloud Big Data sourcesAccessing Big Data with TableauVisualizing your Big Data with TableauSharing your work
Why Big Data? - New Data Types
BI & Analytic
s
Business
Data
Web Logs
Videos
Images
Sensor Data
3rd Party Apps
Why Big Data? – Massive Content
Why Big Data? – Variety of Data
Data Volume Growth
Why Big Data? – Storage is Cheap!
Hard Drive Costs per GB since 1980
Big Data
What it IS What it IS NOTUnstructured
Petabytes+
Evolution of RDBMS
Many Platforms
Difficult for Analytics
Transactional
Simple or Easy
Structured DW
One Platform
Easy or Fast for Analytics
Big Data Storage – Document Stores
Data JSON Document Data
Data
Data
original
copy
copy
PlatformsHDFS, ElasticSearch, CouchDB
Big Data Platforms – Platform Vendors
Big Data in the Cloud
Amazon Redshift Architecture
Columnar DB MPP Architecture Speed!
Amazon Redshift Scalability
2TBXL Node
High Storage Extra Large (XL) DW Node:CPU: 2 virtual cores - Intel Xeon E5Memory: 15 GiBStorage: 3 HDD with 2TB of local attached storageNetwork: ModerateDisk I/O: ModerateAPI: dw.hs1.xlarge
High Storage Eight Extra Large (8XL) DW Node:
CPU: 16 virtual cores - Intel Xeon E5Memory: 120 GiBStorage: 24 HDD with 16TB of local attached storageNetwork: 10 Gigabit Ethernet with support for cluster placement groupsDisk I/O: Very HighAPI: dw.hs1.8xlarge
16TB8XL Node
Amazon Redshift CostOn-Demand PricingDW Node Class (On-Demand) Hourly
XL Node - 2TB storage (Per Node)
$0.850 per Hour
8XL Node - 16TB storage (Per Node)
$6.800 per Hour
DW Node Class (Reserved) Up front Hourly
XL Node - 2TB storage (Per Node) $2,500 $0.215 per Hour
8XL Node - 16TB storage (Per Node) $20,000 $1.720 per Hour
DW Node Class (Reserved) Up front Hourly
XL Node - 2TB storage (Per Node) $3,000 $0.114 per Hour
8XL Node - 16TB storage (Per Node) $24,000 $0.912 per Hour
Reserved Instance 1yr (41% savings)
Reserved Instance 3yr (73% savings)
Amazon Redshift Ease of Use
Fully Managed
Fault Tolerant
Automated Backups
Web Interface
Amazon Redshift Security
AES-256 bit Encryption Amazon VPC Firewall
Amazon Redshift Compatibility
BigQuery
Google Big Query Architecture
Columnar DB Speed!Tree Architecture
Google BigQuery on Speed
“Dremel can
Scan 35 Billion Rows without an Index in
Tens of Seconds” – Solutions Architect, Google Cloud Solutions Team
Google BigQuery Scalability
?
Google BigQuery Cost
Resource Pricing
Storage $80 (per TB/month)
Interactive Queries $35 (per TB processed)
Batch Queries $20 (per TB processed)
On-Demand Pricing
Data Cost
100 TB $3,300 per month ($33 per TB)
400 TB $12,000 per month ($30 per TB)
1,500 TB $40,500 per month ($27 per TB)
4,000 TB $100,000 per month ($25 per TB)
Packaged Pricing
• Packages are billed in full at the end of each month, whether the package is used or not.
• If you use more data than the amount in your chosen package, on-demand rates apply for any additional data.
Google BigQuery: Compatibility
Cloud Big Data Sources Comparison
Amazon Redshift
Columnar + MPP
Petabytes in Scale
Easy management interface
Straight forward billing ($1K/TB/Yr)
Great connectivity w/ BI Tools
Google BigQuery
Columnar + Tree
Infinite Scalability
No Management Required
Confusing Pricing Model
Fair Connectivity w/ BI Tools
bensullins.com