big data analytics preview

26
Big Data Analytics Preview Ben Sullins Bensullins.com [email protected]

Upload: ben-sullins

Post on 10-May-2015

759 views

Category:

Technology


3 download

DESCRIPTION

A preview of my Big Data Analytics course on Pluralsight.com - Presented at the San Diego Tableau User Group Nov 7th, 2013

TRANSCRIPT

Page 1: Big Data Analytics Preview

Big Data Analytics Preview

Ben SullinsBensullins.com

[email protected]

Page 2: Big Data Analytics Preview

Course Outline

Introduction to Big DataMassively Parallel Processing (MPP) DatabasesCloud Big Data sourcesAccessing Big Data with TableauVisualizing your Big Data with TableauSharing your work

Page 3: Big Data Analytics Preview

Why Big Data? - New Data Types

BI & Analytic

s

Business

Data

Web Logs

Videos

Images

Sensor Data

3rd Party Apps

Page 4: Big Data Analytics Preview

Why Big Data? – Massive Content

Page 5: Big Data Analytics Preview

Why Big Data? – Variety of Data

Data Volume Growth

Page 6: Big Data Analytics Preview

Why Big Data? – Storage is Cheap!

Hard Drive Costs per GB since 1980

Page 7: Big Data Analytics Preview

Big Data

What it IS What it IS NOTUnstructured

Petabytes+

Evolution of RDBMS

Many Platforms

Difficult for Analytics

Transactional

Simple or Easy

Structured DW

One Platform

Easy or Fast for Analytics

Page 8: Big Data Analytics Preview

Big Data Storage – Document Stores

Data JSON Document Data

Data

Data

original

copy

copy

PlatformsHDFS, ElasticSearch, CouchDB

Page 9: Big Data Analytics Preview

Big Data Platforms – Platform Vendors

Page 10: Big Data Analytics Preview

Big Data in the Cloud

Page 11: Big Data Analytics Preview
Page 12: Big Data Analytics Preview

Amazon Redshift Architecture

Columnar DB MPP Architecture Speed!

Page 13: Big Data Analytics Preview

Amazon Redshift Scalability

2TBXL Node

High Storage Extra Large (XL) DW Node:CPU: 2 virtual cores - Intel Xeon E5Memory: 15 GiBStorage: 3 HDD with 2TB of local attached storageNetwork: ModerateDisk I/O: ModerateAPI: dw.hs1.xlarge

High Storage Eight Extra Large (8XL) DW Node:

CPU: 16 virtual cores - Intel Xeon E5Memory: 120 GiBStorage: 24 HDD with 16TB of local attached storageNetwork: 10 Gigabit Ethernet with support for cluster placement groupsDisk I/O: Very HighAPI: dw.hs1.8xlarge

16TB8XL Node

Page 14: Big Data Analytics Preview

Amazon Redshift CostOn-Demand PricingDW Node Class (On-Demand) Hourly

XL Node - 2TB storage (Per Node)

$0.850 per Hour

8XL Node - 16TB storage (Per Node)

$6.800 per Hour

DW Node Class (Reserved) Up front Hourly

XL Node - 2TB storage (Per Node) $2,500 $0.215 per Hour

8XL Node - 16TB storage (Per Node) $20,000 $1.720 per Hour

DW Node Class (Reserved) Up front Hourly

XL Node - 2TB storage (Per Node) $3,000 $0.114 per Hour

8XL Node - 16TB storage (Per Node) $24,000 $0.912 per Hour

Reserved Instance 1yr (41% savings)

Reserved Instance 3yr (73% savings)

Page 15: Big Data Analytics Preview

Amazon Redshift Ease of Use

Fully Managed

Fault Tolerant

Automated Backups

Web Interface

Page 16: Big Data Analytics Preview

Amazon Redshift Security

AES-256 bit Encryption Amazon VPC Firewall

Page 17: Big Data Analytics Preview

Amazon Redshift Compatibility

Page 18: Big Data Analytics Preview

BigQuery

Page 19: Big Data Analytics Preview
Page 20: Big Data Analytics Preview

Google Big Query Architecture

Columnar DB Speed!Tree Architecture

Page 21: Big Data Analytics Preview

Google BigQuery on Speed

“Dremel can

Scan 35 Billion Rows without an Index in

Tens of Seconds” – Solutions Architect, Google Cloud Solutions Team

Page 22: Big Data Analytics Preview

Google BigQuery Scalability

?

Page 23: Big Data Analytics Preview

Google BigQuery Cost

Resource Pricing

Storage $80 (per TB/month)

Interactive Queries $35 (per TB processed)

Batch Queries $20 (per TB processed)

On-Demand Pricing

Data Cost

100 TB $3,300 per month ($33 per TB)

400 TB $12,000 per month ($30 per TB)

1,500 TB $40,500 per month ($27 per TB)

4,000 TB $100,000 per month ($25 per TB)

Packaged Pricing

• Packages are billed in full at the end of each month, whether the package is used or not.

• If you use more data than the amount in your chosen package, on-demand rates apply for any additional data.

Page 24: Big Data Analytics Preview

Google BigQuery: Compatibility

Page 25: Big Data Analytics Preview

Cloud Big Data Sources Comparison

Amazon Redshift

Columnar + MPP

Petabytes in Scale

Easy management interface

Straight forward billing ($1K/TB/Yr)

Great connectivity w/ BI Tools

Google BigQuery

Columnar + Tree

Infinite Scalability

No Management Required

Confusing Pricing Model

Fair Connectivity w/ BI Tools

Page 26: Big Data Analytics Preview

bensullins.com