google cloud big data summit master gcp big data summit la - 10-20-2015

113
Big Data Summit - Google Venice October 20, 2015

Upload: raj-babu

Post on 15-Jan-2017

454 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Big Data Summit - Google VeniceOctober 20, 2015

Page 2: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Agenda

2:00 – 2:30

2:30 – 3:30

3:30 - 4:00

4:00 - 4:30

4:30 - 5:00

5:00 - 6:00

Registration & Welcome

GCP Big Data Overview by Rohit Khare, Google PM

Customer Stories - BlueCava & Pixalate

Panel Discussion, Q&A

Partner Story, Magnus Unum

Reception & Networking

Page 3: Google cloud big data summit   master gcp big data summit la - 10-20-2015

3

● Parking behind Chaya Restaurant on Navy Street● Visitor badges● Washrooms● Beverage & food service● Wireless access “GoogleGuest”

Logistics

3

Page 5: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 5

Build. Store. Analyze.Google Cloud Platform for Big DataFocus on insights, not infrastructure

Big Data Summit, Los Angeles — October 20, 2015

Rohit Khare, Google Cloud Product ManagerWilliam Vambenepe, Lead Product Manager for Big Data

Page 6: Google cloud big data summit   master gcp big data summit la - 10-20-2015
Page 7: Google cloud big data summit   master gcp big data summit la - 10-20-2015
Page 8: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 8

BuildConnect Visualize Find Access

Page 9: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 9

IaaS PaaS SaaSInfrastructure-as-a-Service Platform-as-a-Service Software-as-a-Service

Google Cloud Platform

Cloud Computing

Page 10: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 10

Enterprise Cloud Platform market will exceed $43B globally by 2018.

2013

Page 11: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 11

AffordableCapacity

The decreasing cost of storage enables virtually unlimited

storage in the cloud. $600 can buy enough storage for the

world’s music.

(Source: McKinsey Global Institute May 2011)

Computing as a utility is now available for easy purchase,

provided from massively efficient data centers.

(Source: Nicholas Carr, The Big Switch, 2008)

The internet allows for a model of real-time access to new innovation, information and

applications from a wide range of devices.

IT Trends

On-demandcomputing

Instant access

Page 12: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 12

On and Off Growing Fast

• Successful services needs to grow/scale

• Keeping up w/ growth is big IT challenge

• Cannot provision hardware fast enough

• On & off workloads (e.g. batch job)

• Over provisioned capacity is wasted

• Time to market can be cumbersome

Cloud Computing Patterns

Page 13: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 13

Unpredictable Bursting Predictable Bursting

• Services with micro seasonality trends

• Peaks due to periodic increased demand

• IT complexity and wasted capacity

• Unexpected/unplanned peak in demand

• Sudden spike impacts performance

• Can’t over provision for extreme cases

Cloud Computing Patterns

Page 14: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 14

100 1,000 10,000 100,000

$0

$2,000

$4,000

$6,000

$8,000

publiccloud

privatecloud

servers servers servers servers

Cloud Economics10x cost benefit for large scale deployments

Page 15: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 15

Google Cloud Platform

Google Ecosystem + APIs

• Take advantage of Google’s entire ecosystem of services:

Search

Web analytics

Monetization

App Distribution

Page 16: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 16

We provide all of our customers with Bronze support giving you access to online documentation, community forums, and billing support.

If you want direct access to our support team for questions related to service functionality, best practice architectures, and service errors.

If you want 24 x 7 phone support, more rapid target initial response times and consultation on application development, and architecture for your specific use case.

If you want the most comprehensive, personal and customized support we offer. Includes everything in Gold support as well as direct access to the Technical Account Management team.

Goldstarts at $400/month

PlatinumContact Sales

Silver$150/month

BronzeFree

Support

Page 17: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 17

SSAE-16SOC 1

SSAE-16SOC 2

SSAE-16SOC 3

ISO27001

HIPAA(BAA)

PCI DSS v3.0 FISMA FedRamp

GAE Complete Complete Complete Complete H1 15 Complete FISMA (Moderate) H2 15

GCS Complete Complete Complete Complete Complete Complete n/a H2 15

GCE Complete Complete Complete Complete Complete Complete n/a H2 15

Datastore Complete Complete Complete Complete H1 15 Complete n/a H2 15

Big Query Complete Complete Complete Complete Complete Complete n/a H2 15

Cloud SQL Complete Complete Complete Complete Complete Complete n/a H2 15

Genomics H1 15 H1 15 H1 15 Complete H1 15 n/a n/a H2 15

Apps Complete Complete Complete Complete Complete n/a GAFG only H2 15

Certifications

Page 18: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 18

Pricing should be flexible and easy to understand. You shouldn’t need a PHD to understand prices, and you should get the best price automatically.

If you use a Compute Engine VM for more than 25% of a month, you receive discounts automatically.

Compute Engine instances are charged in one-minute increments (with a 10 minute min), so you only pay for what you use.

Per MinuteBilling

Sustained UseDiscounts

Philosophy

Pricing

Page 19: Google cloud big data summit   master gcp big data summit la - 10-20-2015

For the past 15 years, Google has been building out one of the world’s fastest, most

powerful, highest quality cloud infrastructure on the planet.

Page 20: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Cloud Platform is built on the same infrastructure that powers Google.

Page 21: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 21

2002 2004 2006 2008 2010 2012

ColossusMapReduce

SpannerBig Table

Dremel

GFS

Google Innovations in Software

2013 2014

Dataflow

Kubernetes

Page 22: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 22

A look inside Google Cloud Platform

Page 23: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 23

Google Cloud Platform

NetworkingCompute Big Data Management Storage Mobile DeveloperTools

Page 24: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 24

ManagementNetworkingCompute Big Data Storage Mobile DeveloperTools

Google Cloud Platform

Compute

Compute Engine

Container Engine

App Engine

Page 25: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 25

ManagementNetworkingCompute Big Data Storage Mobile DeveloperTools

Google Cloud Platform

Storage

Cloud Storage

Cloud SQL

CloudDatastore

CloudBigTable

Page 26: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 26

NoSQL SQL Blob Block

Easy-to-use storage options

Page 27: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 27

Cloud StorageGoogle Cloud Platform

Page 28: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 28

Google Cloud Platform

Cloud Storage: Value

• Safe: Redundant storage at multiple physical locations. OAuth and granular access controls form strong, configurable security

• Ease of Use: Same APIs as other CGS products

• High Performance: We provide, 99.95% SLA and 24x7 phone support

• Pricing: Pay only for what you use with some of the lowest prices in the industry

Page 29: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 29

Google Cloud Platform

Cloud Storage: Features

• 3 storage options

○ Standard: The highest level of durability, availability and performance

○ DRA: High level of durability, availability and performance

○ Nearline: High performance data archiving, online backup, and disaster recovery

Page 30: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 30

Cloud DatastoreGoogle Cloud Platform

Page 31: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 31

Google Cloud Platform

Cloud Datastore: Value

• Accessible Anywhere

• Secure Sharing

• Same High Replication Datastore Used By App Engine Apps Today

• Equally Fast Queries For Any Sized Dataset

• Data is Replicated Across Several Data Centers

• Use From Any Application or Language

• Serving 4.5 Trillion Requests Per Month

Page 32: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 32

Google Cloud Platform

Cloud Datastore: Features

• Auto-scale

• Schemaless Access

• SQL-like Capabilities

• Authentication That Just Works

• Fast and Easy Provisioning

• RESTful Endpoints

• ACID Transactions

• Local Development Tools

• Built-in Redundancy

Page 33: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 33

Cloud SQLGoogle Cloud Platform

Page 34: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 34

Google Cloud Platform

Cloud SQL

• Fully managed

• Ease of Use

• Highly Reliable

• Flexible Charging

• Security, Availability, Durability

• EU and US Data Centers

• Easy Migration & Data Portability

• Control

Page 35: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 35

Cloud BigTableGoogle Cloud Platform

Page 36: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 36

ManagementNetworkingCompute Big Data Storage Mobile DeveloperTools

Google Cloud Platform

Big Data

Big Query Cloud Pub/Sub

CloudDataflow

Page 37: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Manage the Entire Lifecycle of Big Data

Store AnalyzeProcessCapture

Page 38: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Manage the Entire Lifecycle of Big Data

Cloud Logs

Google App Engine

Google Analytics Premium

Cloud Pub/Sub

BigQuery Storage(tables)

Cloud Bigtable(NoSQL)

Cloud Storage(files)

Cloud Dataflow

BigQuery Analytics(SQL)

Capture Store Analyze

Batch

Cloud DataStore

Process

Stream

Cloud Monitoring

Cloud Bigtable

Real time analytics and Alerts

Cloud Dataflow

Cloud Dataproc

Page 39: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 39

BigQueryGoogle Cloud Platform

Page 40: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 40

Google Cloud Platform

BigQuery: Value

● Performance: Ingest data at 100K rows/second and process real-time queries

● Ease of use: No administration for performance and scale

● Scale: No need to worry about growing data. Unlimited storage with pay as you go pricing model

● Non-technical analysts can drive queries on massive datasets using BI tools

Page 41: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 41

Google Cloud Platform

BigQuery: Features

● Interactive query performance: Query multi-terabyte datasets in an ad hoc manner

● SQL: Familiar SQL-like query syntax and intuitive user interface

● Data mashup: Query across diverse datasets

● Highly Available: Data replication in multiple geographies. Data is available and durable even in the case of extreme failure modes

● Secure: Access to data is controlled using customer-owned ACLs

Page 42: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 42

Cloud Pub/SubGoogle Cloud Platform

Page 43: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 43

Google Cloud Platform

Cloud Pub/Sub: Value

● Scalable, flexible, and reliable enterprise message-oriented middleware to the cloud

● Provides asynchronous messaging, allowing secure and highly available communication between independently written applications

● Delivers low-latency, durable messaging that helps developers quickly integrate systems hosted on the Google Cloud Platform and externally

Page 44: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 44

Google Cloud Platform

Cloud Pub/Sub: Features

• Unified messaging: Durability and low-latency delivery in a single product

• Global presence: Connect services located anywhere in the world

• Flexible delivery options: Both push- and pull-style subscriptions supported

• Data reliability: Replicated storage and guaranteed at-least-once message delivery

• Data security and protection: Encryption of data on the wire and at rest

Page 45: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 45

Cloud DataflowGoogle Cloud Platform

Page 46: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 46

Google Cloud Platform

Cloud Dataflow: Value

• Reduce cost of processing large datasets

• Save time: Automatically optimizes data-centric pipeline code by collapsing multiple logical passes into a single execution pass

• Increase efficiencies: Fully manages the lifecycle of required compute resources

• Simple: Dataflow makes it easy to write data-processing pipelines that incorporate both batch and stream-processing capabilities and is language-agnostic

Page 47: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 47

Google Cloud Platform

Cloud Dataflow: Features

• Unified programming model for both batch and stream-based data analysis

• Managed scaling: Manages the lifecycle of required compute resources

• Reliable & consistent processing: Built-in support for fault-tolerant execution

• Monitoring: Provides lifecycle statistics including in flight information like real time pipeline throughput, real time step lag and real time worker log inspection

Page 48: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 48

Cloud DataprocGoogle Cloud Platform

Page 49: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Programming

Resource provisioning

Performance tuning

Monitoring

ReliabilityDeployment & configuration

Handling growing scale

Utilization improvements

Typical Big Data Processing

Focus on Insight, Not infrastructure

Programming

Big Data with Google

Reduce Time to Understanding

Page 50: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Continuously accommodating greater data volumes and new data sources

Capture and store all data for all business functions

Complexity of building and maintaining a Big Data system with consistent ease of use

Reducing the time from data collection to action

Managing the cost of the data platform

1

2

3

4

Hurdles to innovate and iterate with Big Data

5

Keep system reliables/running

Keep your data secure

Collaboration within or across organizations7

8

9

6

Traditional Big Data = Big Problems

Page 51: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Google BigQuery

Google Compute and APP Engine Scalable VMs

TBs of Data

Process in seconds

Data Collection

ETL

Raw Data Storage

Aggregation

Analytics Storage

Visualization

Google Cloud Storage

Google Cloud Platform

1

2

3

4

5

6

Interactive Dashboards + apps

BI tools

Google Spreadsheets

1

2Collection

TransformationData processing

Cleansing4

Serve Analytics

Raw Data StorageBigQuery Staging

3 BigQuery Aggregate Staging

Raw Data Storage AdHoc QueriesREST API

5

6

Google Confidential

Page 52: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Google confidential │ Do not distribute

Overview:Data to process: Data in the Consolidated Audit Trail (CAT). A data repository of all equities and options orders, quotes, and events

Challenges:How to process the CAT and organize 100 billion market events into an “order lifecycle” in a 4 hour windowStore 6 years (~30PB) of data

Cloud Bigtable to process and run queries and tolerate volume increases

6 BILLIONMARKET EVENTS

WRITTEN PER HOUR

1.7 GIGsPER SECOND

PER HOUR

6 TBs

10 BNWRITTEN

PER HOUR BURSTS

1.7 GIGABYTESPER SECOND

10 TERABYTESPER HOUR

Page 53: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Google confidential │ Do not distribute

Overview:Data to process: standard game KPIs, marketing data, custom game insight

Several dozen gigabytes of raw logs per day

Challenges:Struggled to process large volume of data

Long delays between triggering logs and querying data; problematic for games running live events

Issues controlling permissions

Long-running queries, clunky analysis

Overview:Data to process: Standard game KPIs, marketing data, custom game insight

Several dozen gigabytes of raw logs per day

Challenges:Struggled to process large volume of data

Long delays between triggering logs and querying data; problematic for games running live events

Issues controlling permissions

Long-running queries, clunky analysis

“BigQuery has helped us focus on actually using data instead of exhausting ourselves just trying to get to the data.”

CRUNCH

150GIGS OF DATA IN15 SECONDS

INSTANT

LOG INGESTION

SCALEW

ITH

OU

T CLOGGING THE SYSTEM

F L E X I B I L I T Y

ON PERMISSION

CONTROLS

Page 54: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 54

Page 55: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 55

700 million

“App Engine enabled us to focus on developing the application. We wouldn’t have gotten here without the ease of development that App Engine gave us.”Bobby Murphy, CTO

Snapchat sends

photos and videos each day Google App Enginescaled seamlesslyduring growth to

millions of users

Small team is ableto innovate quickly

and expandglobally

Page 56: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Big Data Partner Ecosystem

Chartio

Page 57: Google cloud big data summit   master gcp big data summit la - 10-20-2015

cloud.google.com

Page 59: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BLUECAVA, INC. / 2015BLUECAVA, INC. / 2015 PAGE 59

CROSS SCREEN STARTS HERE

Page 60: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BLUECAVA, INC. / 2015

BLUECAVABusiness / Product / Challenges

PAGE 60

Page 61: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BLUECAVA, INC. / 2015

INTRODUCTION

PAGE 61

Reza QorbaniCTO @ BlueCava

• Work with Google Big Data Team in past 1.5 years

• Move from 100% Private Cloud to Hybrid Environment

• Deep Integration with Big Query

Email

[email protected]

Twitter

@qorbani

Page 62: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BLUECAVA, INC. / 2015

DIS

PLA

YM

OB

ILE

VID

EOEX

CH

AN

GE

SOC

IAL

Real-timeIntelligence

ABOUT – BlueCava

PAGE 62

VA

LIDA

TION

DEM

OG

RA

PH

LOC

ATIO

NEXC

HA

NG

EC

OV

ERA

GE

Association Graph

DataTechPlatforms

AdTechPlatformsOpen Network that Optimizes

Cross-Screen Marketing

MARTECH PLATFORMS & SERVICES

Page 63: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BLUECAVA, INC. / 2015

ABOUT – Association Graph

PAGE 63

House Hold

Consumer B Consumer A Consumer C

IDFA APN BCID

Page 64: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BLUECAVA, INC. / 2015

ABOUT – Coverage

PAGE 64

100M / House Holds

240M / Consumers

600M / Devices

Page 65: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BLUECAVA, INC. / 2015

ABOUT – Volume

PAGE 65

5 TB DailyDaily RAW Logs

250k req/secFrom Partners and Exchanges

1.3 PetabyteTotal Storage

25 Billion IDsIncluding our Partner IDs

Page 66: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BLUECAVA, INC. / 2015

ABOUT – Challenge

PAGE 66

− Generate data for customers− Multiple extraction at time− Keep data for months− Highly Available

− Easily run Ad-Hoc queries − Handle lots of POCs− Flexible to Change− Unified Data Store

− Bandwidth Cost− Storage Cost− Infrastructure Cost− Operation Cost

Cost Flexibility Delivery

Page 67: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BLUECAVA, INC. / 2015

ARCHITECTUREBlueCava Platform Overview / Before / Now / Future!

PAGE 67

Page 68: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BLUECAVA, INC. / 2015

ARCHITECTURE – BlueCava Platform Overview

PAGE 68

CORE INTERNAL CUSTOMER

PLATFORM

EDGEX BIDDER OPERATIONS QUALITY API PORTAL

METADATA PREPARE

LOGGINGA

GG

REG

ATE

FILT

ER

DET

ECTO

R

TRANSFER / PREPARE PROCESS / ASSOCIATION ANALYZE / REPORT

AG AE DB

Page 69: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BLUECAVA, INC. / 2015

ARCHITECTURE – Before

PAGE 69

WEST (IRVINE) EAST (ASHBURN)

CORECORECUSTOMERINTERNAL

PLATFORMBACKUP / DR

Geographic Load Balancing

XDC NET

Page 70: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BLUECAVA, INC. / 2015

ARCHITECTURE – Before / Challenges

PAGE 70

CostEstimate of $1.5M upfront to scale up

High Monthly Bandwidth costNeed to Extend Operation team

Scalability

Performance

Storage

Complexity

Resource Limitations

Datacenter Issue with Traffic spikes Need to scale down after POC finishes

Some processes took more than a dayCustomer delivery takes 5-10 hours

Ad-Hoc queries taking hours

Need more historical data to increase qualityNeed to keep customer data for monthsDeliver large amount of data to customers

Simple Tasks Require Data Engineering ExpertiseCustomizing Data Output was hard

Data Scientists need meaningful data setQA/Dev Environment SeparationAd-Hoc queries create issue for production

Page 71: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BLUECAVA, INC. / 2015

ARCHITECTURE – Before / Solution

PAGE 71

Big Query

▪ Big Data as a Service

▪ Extremely cost effective for our use-case▪ Support Hierarchical Data Model▪ Extremely fast▪ Query using SQL

▪ Solve most of our Big Data challenges

▪ Fraction of cost (It was Unbelievable)▪ Customer Delivery in Seconds!!! ▪ We dropped Delivery Spark Cluster (10 nodes)▪ We dropped Ad-Hoc Hadoop Cluster (100x nodes)▪ Offload ALL Customer Facing Jobs▪ Only 2 Sprints Development (6 Weeks)

Page 72: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BLUECAVA, INC. / 2015

ARCHITECTURE – Before / Solution

PAGE 72

Cloud Storage

▪ Nice integration with Big Query

▪ No file size limit like S3▪ HDFS Integration using Hadoop Connector▪ Seamless Cost Saving: DRA and Nearline

▪ Solved most of our Storage challenges

▪ Simplified our file delivery▪ Extremely competitive pricing ▪ No need for Backup ☺

Page 73: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BLUECAVA, INC. / 2015

ARCHITECTURE – Before / Solution

PAGE 73

Compute Engine

▪ Great Sustained Pricing

▪ No need for long-term contract▪ Simple CLI for Automation▪ BDUtil Library for Hadoop

▪ Elastic Environment which saved us on Cost

▪ 100+ nodes Hadoop under 6 minutes▪ Use as On-Demand Resource as needed▪ Stop purchasing more hardware!

Page 74: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BLUECAVA, INC. / 2015

ARCHITECTURE – Now

PAGE 74

WEST (IRVINE) Google Cloud Platform

CORE CUSTOMERINTERNAL

PLATFORM

Cloud Storage

Simple DNS

Interconnect Big Query

Page 75: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BLUECAVA, INC. / 2015

ARCHITECTURE – Future!

PAGE 75

CostMove all in Cloud

ScalabilityWorld-wide Coverage

PerformanceReal-time Association

SimplifyData Science Lab

Container Engine Dataproc Dataflow Datalab

Page 76: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BLUECAVA, INC. / 2015

ARCHITECTURE – Future!

PAGE 76

CORE REALTIME PROCESS

ASSOCIATION GRAPH

QUERY

LAB

STORAGE

INTERNAL

CUSTOMER

BATCH PROCESS

Page 77: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BLUECAVA, INC. / 2015 PAGE 77

THANK YOU

Page 79: Google cloud big data summit   master gcp big data summit la - 10-20-2015

@

Amin Bandeali, Founder & CTOPixalate, Inc.

Page 80: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Agenda● What is Pixalate?● My Role @ Pixalate● Pixalate Breadth and Depth● What is Ad Fraud and why is it important to solve?● Challenges● Ad Fraud ● Real World BigQuery Use Cases ● Conclusion

Page 81: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Our Mission

To Rate the Whole Internet…...and YES we also see what Google doesn’t see!

Page 82: Google cloud big data summit   master gcp big data summit la - 10-20-2015

What is Pixalate?Pixalate is a defacto Ratings Standard for Programmatic Advertising.

SellerTrustIndex.com

Page 83: Google cloud big data summit   master gcp big data summit la - 10-20-2015

My Role @ Pixalate● Co-Founder, CTO and Solution Architect● Real-Time Data Junkie - Contributed to Apache Hadoop Project● Largest AWS DynamoDB user upon launch - not using it anymore!● Largest AWS SQS user - not using it anymore!● Pixalate backend runs Java, NodeJS, Redis, Solr, S3 and BigQuery● Denied using 25000 free hours of AWS Redshift!● 70% of Pixalate technology runs on AWS -- 30% on BigQuery● We move 2TB of data from AWS to Google Storage just for BigQuery

Page 84: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Challenges

Process 1+ Trillion Ad Transactions Data/month

Processing Upto 3 PB/month

Analyze Massive amounts of Data to detect fraud

Create customized reports with NO engineering support!

Close to 1 Trillion rows of data in BigQuery

Page 85: Google cloud big data summit   master gcp big data summit la - 10-20-2015

What is Ad Fraud?

Page 86: Google cloud big data summit   master gcp big data summit la - 10-20-2015
Page 87: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Ad Fraud against AdMob and MacDonlds

Page 88: Google cloud big data summit   master gcp big data summit la - 10-20-2015
Page 89: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Our Realtime Fraud Map

http://www.pixalate.com/map

Page 90: Google cloud big data summit   master gcp big data summit la - 10-20-2015

What’s wrong with this data?

Page 91: Google cloud big data summit   master gcp big data summit la - 10-20-2015

A day in the life of Data Science Team● An Account Manager requests the data science team for customized report for

a client that measures some specific metrics for the last 6 months of their data.

● Solution 1: AWS EMR - Boring and takes Hours!○ The Big data engineers will execute an EMR (Hive) job that extracts the data and creates the

report

● Solution 2: BigQuery - Fun and takes Seconds!○ The data science team implements a usually complex query that calculates all the metrics in

SQL○ BigQuery will process a couple of TB of data and create the report in few seconds.

Page 92: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Bypassing the Engineers!● We need to expand a list of 500,000 network addresses in CIDR format (e.g.

128.0.0.1/24) to regular IP format and use them in client reports● Solution 1: Java

○ provide the Java engineers with the requirements ○ wait for implementation completion○ wait for UAT and Production push○ store the data in a database

■ total time ~3 workdays (in Startup Timezone)

● Solution 2: BigQuery○ the data science team writes a query with 25+ table JOINs and UNIONS that takes care of the

expansion in a clean, easy to test way, and runs it in BigQuery■ total time ~3 hours

Page 93: Google cloud big data summit   master gcp big data summit la - 10-20-2015

From Waste Picking to Innovation

● The amount of digital data in the universe is growing at an exponential rate, doubling every two years, and changing how we live in the world. ○ YET only .5% of that data is analyzed!

● If you can’t mine these data easily and extract semantics, ○ then how is data collection different than waste-picking???

● BigQuery enables Innovation○ It breaks the dependency between data scientists and big-data

engineers

○ Now data scientists can write complex queries and analyse massive

amounts of data without the need of any backend coding (e.g. Java),

or some other big data framework

○ It enables the deep understanding of complex data and their

dependencies

Page 94: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Cost reduction using BigQuery● Complex data processing pipelines impose a new cost optimization

challenge● Main questions to be answered:

○ Where do I store the data I collect?○ Where/How do I aggregate the data I collect?○ How do I enhance the data I collect with other metadata?○ How do I process the data collected?

■ such that the overall cost is minimized??● BigQuery can HELP!

Page 95: Google cloud big data summit   master gcp big data summit la - 10-20-2015

BigQuery Health Monitoring Using BigQuery

Page 96: Google cloud big data summit   master gcp big data summit la - 10-20-2015

But Wait!

Here’s the real benefit...

Page 97: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Zero Cost Queries Over Petabytes!

● How can you query PETABYTES of historical data and create time series to detect traffic anomalies (e.g. network failures, etc)?

● BigQuery Zero Cost queries (a.k.a. table metadata) ○ can give you the big picture regarding table’s data health

■ within seconds ■ without having to run any costly queries

suspicious activity

Page 98: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Big Query Success is all about the Architecture

Spend a LOT of time on Table Schemas (hint: keep them flat)

Page 99: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Learnings● BigQuery has its gotchas!

○ The wrong Sharding strategy can slow you down○ Know your Quotas well -- they will haunt you!○ Balance the table JOINs appropriately○ Don’t use ORDER BY unless it’s mandatory○ Avoid “SELECT *” queries on “fat” tables over long time ranges

● Secret recipe○ push as much complexity as possible to BigQuery using advanced queries

■ usually > 100 lines of SQL code○ use backend languages (e.g. Java) to simply orchestrate the data pipeline○ don’t be scared of data duplication -- storage cost is much cheaper than analysis cost!

Page 100: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Q&A

Amin Bandeali

p: 888.749.2528 m: 714.757.9544e: [email protected] t: http://twitter.com/aminbandeali

Page 101: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Confidential & ProprietaryGoogle Cloud Platform 101

Panel Q&A

Rohit Khare, GCP Big Data PMReza Qorbani, BlueCava CTO

Amin Bandeali, Pixalate CoFounder & CTO

Page 102: Google cloud big data summit   master gcp big data summit la - 10-20-2015

04 Partner Story - Magnus UnumRajesh Babu, BI, Big Data & Analytics solutions ArchitectSubash D'Souza, Big Data Evangelist

deck

Page 103: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Magnus UnumRaj Babu & Subash D’Souza

Modern BI & Big Data platform with Google Cloud

Page 104: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Magnus Unum…what we do

We are a LA based Big Data, Data Science & Analytics

Consulting Services firm specialized in advising our clients on Strategy, Road

Map/Blue Print, Implementation, Deployment, Maintenance/Support/Operations for their Big Data,

Data Science, BI and Analytics solutions

Page 105: Google cloud big data summit   master gcp big data summit la - 10-20-2015

• Raj Babu• Co – Founder, Magnus Unum• Founder, Agile iSS• 20 years of experience in the BI & Analytics field• Worked on numerous, very large BI migration and Integration

projects

• Subash D’Souza• Over 10 years of experience in building scalable solutions for

various enterprise companies• Organizer for several LA User Groups including Big Data,

Apache Spark & Apache HBase• Organizer for Big Data Day LA• Recognized as a Champion of Big Data by Cloudera

Magnus Unum…Leadership

Page 106: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Magnus Unum - Key Services

• Architect, Design & Build Big Data Solutions• Cloud Migration services for Big Data,

Analytics & BI• Big Data Engineering & Staffing • Big Data managed & support services• Data Science Solutions & Services

Page 107: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Magnus Unum – Expertise• On-Prem

Cloudera, Hortonworks, IBM, Pivotal & MapR

• CloudGoogle Cloud, Amazon AWS & Microsoft Azure

• Analytics/ ReportingTableau, MicroStrategy, SAP BO, Qlik & Pentaho

• Data ScienceMachine Learning, R, SAS & Data Analytics

Page 108: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Why Google Cloud Platform?

Page 109: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Use Case 1 – Migrating your Data Warehouse and BI to Google Cloud• Capture / Migrate or Capture• Storage / Data Management• Data Processing• Query/Analytics• Data Integration• Access Control

Page 110: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Use Case 2 – Google Analytics detailed analysis

• Limitation in Google Analytics daily export• More detailed analysis available as part of

Google Cloud Platform( Must have premium access)

• Can analyze granular level details of User Interaction on websites and aggregate the results for display on-prem or within GCP

Page 111: Google cloud big data summit   master gcp big data summit la - 10-20-2015

Please reach out to us for a free Consultation & Assessment of your BI, Big Data & Analytics

needs

& additional $500 in GCP credits!

Page 113: Google cloud big data summit   master gcp big data summit la - 10-20-2015

cloud.google.com/free-trial

Questions? [email protected]

Get $300 in credit to use for 60 days.