big data analytics on the google cloud platform

44
Big Data Analytics on January 9 th , 2014

Upload: bigdatacloud

Post on 20-Aug-2015

2.789 views

Category:

Technology


2 download

TRANSCRIPT

Big Data Analytics on

January 9th, 2014

GROW WITH BIG DATA.Third Eye Consulting Services &

Solutions LLC.

For Questions

Tweet Directly to @ThirdEyeCss

We are actively monitoring this Twitter channel!

Agenda1. 5 minutes

- Introductions2. 15 minutes

- Introduction to the Google Cloud Platform & its various Big Data services

3. 10 minutes - Showcasing various Online Retail Analytics - User, Site & Products Analytics

4. 15 minutes - Live Demonstration - Ingestion of session log data to visualization in Tableau

5. 15 minutes- Q&A Session(Can extend beyond based on the audience enthusiasm & participation!)

Google Cloud Platform

App Engine Big Query Cloud SQL Cloud Storage Compute Engine

Google Cloud Platform – Key Components

https://cloud.google.com

Tweet @ThirdEyeCss

A highly elastic and scale on demand infrastructure for deploying and running front end web applications

App Master

Front End Instance 1Front End Instance 2Front End Instance 3Front End Instance n

App Server Instance 1App Server Instance 2App Server Instance 3App Server Instance n

Datastore

Memcache

Static

Files

App Engine - Architecture

https://cloud.google.com/products/app-engine

Scales on Demand Very low barrier for entry No initial hardware costs Issues such as scalability, reliability are non-issues Can handle very large amounts of data Can handle very large user volumes, including

sudden spikes by scaling elastically

App Engine - Advantages

https://cloud.google.com/products/app-engine

A column oriented data store that can store and process billions of rows of data

SQL like query syntax for querying data

Run ad-hoc queries against multi terabyte data sets in seconds

Highly scalable, reliable and secure as it uses underlying core Google Platform Infrastructure

BigQuery

https://cloud.google.com/products/big-query

Supports all the main ETL and BI tools like Informatica, Talend, QlikView and Tableau

Primarily used for real-time data analysis and visualization

Integration with App Engine through APIs

BigQuery

https://cloud.google.com/products/big-query

SQL Access

Only SELECT operations

No CREATE, UPDATE or DROP

Analysis of Unstructured data using REGEXP_yyyy functions

JOINs of small (<8mb of compressed data) and large tables are possible. Performance penalty for large table joins

BigQuery

https://cloud.google.com/products/big-query

Programmatic Access bq command line tool, Google API client library,

REST API

Google API client library supports various languages like Java, Python, JavaScript, Ruby, PHP, Google Apps Script

Authentication is handled via Oauth2

In REST API, credentials and HTTP request have to be handled manually by user

BigQuery

https://cloud.google.com/products/big-query

Use Cases Can be used for batch analysis of large data sets

Real time analytics for dashboard type applications

Pre-process very large data sets and serve data in real-time

Visualization using third party tools that call Big Query APIs.

BigQuery

https://cloud.google.com/products/big-query

MySQL database running on the Google Cloud Platform Easy migration from local MySQL instances to Cloud SQL Highly scalable and reliable with replication Supports all major MySQL features including stored

procedures, triggers and views GUI Frontend for easy administration and operations Built on top of core Google Infrastructure Easy integration with App Engine

Cloud SQL

https://cloud.google.com/products/cloud-sql

A highly reliable cloud storage platform for storing and accessing vast amounts of data

Can be used for data archival and content delivery

Data can be ingested and processed by other Google Cloud Services

Accessible through GUI, command line and APIs

Cloud Storage

Cloud SQL

BigQuery

Cloud SQL

Custom App

Cloud Storage

https://cloud.google.com/products/cloud-storage

Object store that can deliver very efficiently over the internet Not a mountable file system Buckets are the basic container. They cannot be nested and can reside in

the US or EU geographies. Objects are stored in buckets. They are immutable and can be upto 5TB

in size. ACLs can be setup for Google users, groups, app domain, authenticated

users with READ, WRITE or FULL_CONTROL. Signed URL access for anonymous users.

Can be accessed using XML and JSON REST APIs Command line access using gsutil tool App Engine Storage API for access from App Engine

Cloud Storage

https://cloud.google.com/products/cloud-storage

Infrastructure as a service

Linux Virtual machines with associated storage and network infrastructure are hosted by Google

Can run any type of application or workload in the google cloud that uses the same Google Core Infrastructure

Highly elastic and scalable

A typical use case would be to provision a Hadoop Cluster on demand using several 10s to 100s of virtual machines as name node and data nodes

Compute Engine

https://cloud.google.com/products/compute-engine

Various machine type configurations possible such as High Memory, High CPU, Standard etc.

Very easy provisioning and management using cloud management software like RightScale

CentOS and Debian are the default OSes currently supported.

Typical use cases are batch processing, log analysis, i/o intensive workloads, hadoop on the cloud (map/reduce)

Compute Engine

https://cloud.google.com/products/compute-engine

Online Retail Analytics

& Visualization

Online Retail Industry

Forrester: U.S. Online Retail Sales to Hit $370 Billion by 2017

Healthcare Store

Large online retailer’s Health Store website.

Thousands of health care products are sold per month.

These large online retailers are killing us!

I need to increase sales.I need to understand my site visitors better.Can Big Data Analytics help?

VP OF MARKETING

DATA SCIENTIST

Yes, Big Data Analytics can help!Google’s Cloud platform handles all the complexities of Big Data processing.We start with regular session log files.

Time & Date when

visitor came on site

Unique User & Session Id

Product Page Visited

by UserReferral Site

Session Log File (W3C compliant)

DATA SCIENTIST

From the simple log files, we can do sophisticated analytics like these:

User Analytics• # of Unique Site Visitors,

per hour, per day• # of Return Site Visitors,

per hour, per day• Total # of Site Visitors,

per hour, per day• Top 10 Active Users

per hour, per day

DATA SCIENTIST

Product Analytics like these:• Top 10 Popular

Products per hour, per day

• Top 10 popular Products in Shopping Basket per hour, per day

• Top 10 Bought Products

per hour, per day

DATA SCIENTIST

Conversion Analytics like these:• # of users who added products

to shopping basket per hour, per day

• # of users who actually bought products per hour, per day

• % of users who browsed, added products to shopping cart & actually bought per hour, per day.

DATA SCIENTIST

Behold, The Google Cloud Platform’s Dashboard!

List of availabl

e Services

.

DATA SCIENTIST

Google Cloud Platform’s Cloud Storage

Session Log Files

Uploaded

to Cloud Storag

e.

DATA SCIENTIST

Google Cloud Platform’s BigQuery

Tables on

BigQuery with data from

Session Log

Files.

DATA SCIENTIST

Running a Query on BigQuery

Queries on

BigQuery are very

much SQL like,

easy to develo

p & gets

results fast.

DATA SCIENTIST

Visualize BigQuery’s Results in

Tableau provides an easy

& effective way to develop dash-

boards &

reports.

DATA SCIENTIST

Site Analytics – Referral Site Comparisons

Traffic referre

d to site from other

sources like

Google.com

DATA SCIENTIST

Site Analytics – Referral Site Comparisons

Traffic referre

d to site from other

sources like

Google.com

DATA SCIENTIST

Site Analytics – Referral Site Comparisons

Traffic referre

d to site from other

sources like

Google.com

DATA SCIENTIST

Product Analytics - Product Purchase Trends

Analysis of

specific product

s as purchase

d on site over

hours / days in a month

DATA SCIENTIST

Conversion Analytics - Product Added to Cart vs. Bought.

Analysis of which products were placed in cart

vsactually bought

over hours / days in a month

DATA SCIENTIST

Conversion Analytics - Conversion Rate Trends

Analysis of which products were placed in cart

vsactually bought

over hours / days in a month

DATA SCIENTIST

You now know: - how are your products selling, - when are they selling, - which referring site helps the most and other such info. You now have the power of Big Data Analytics on your fingertips!

VP OF MARKETING

Wow! Now, I can compete against all the giants!

Let me start on my marketing plans!

Q&A

@ThirdEyeCss

Third Eye is Google’s Partner for the Google Cloud Platform

We are mentioned on Google’s Cloud Platform, site: https://cloud.google.com/partners/

Tweet @ThirdEyeCss

Contact:Dj Das, Founder & CEO, [email protected]

Alan Merrihew, VP of Business Development, [email protected]

Phone - (408) 462-5257

Corporate Site - ThirdEyeCSS.com

Big Data Training - ThirdEyeClasses.com

Big Data Educational Seminars - BigDataCloud.com, BigDataCloudToday.com, meetup.com/BigDataCloud

Big Data Jobs - jobs.BigDataCloud.com

Big Data Analytics As a Service - ClustersTogo.com, Power140.com, Raaser.com, PowerI90.com

THANK YOU!