open source for customer analytics

17
Open Source for Customer Analytics Matthias Funke Business & Technology Consultant

Upload: matthias-funke

Post on 11-Jan-2017

206 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Open source for customer analytics

Open Source for Customer Analytics

Matthias FunkeBusiness & Technology Consultant

Page 2: Open source for customer analytics

Agenda Topics

Open Source Software

Data Products

The “Data Process”

Tying it together

Page 3: Open source for customer analytics

Open Source Software

Examples: Linux, LibreOffice, Eclipse, Hadoop

Source Code open, e.g. github.com (>3M users, 6.8M repos)

Governed by foundations, e.g. Apache Software Foundation, Free Software Foundation

Contributors / committers: Academia, start-ups, corporations, specialised OSS companies

Page 4: Open source for customer analytics

Popular Apache Software Projects

Project Donated by...

Cassandra Facebook (2008)

Storm Twitter (2013)

Hadoop Yahoo (2008)

Kafka LinkedIn

Page 5: Open source for customer analytics

Apache Software Foundation SponsorsGoogle, Yahoo, Microsoft, Facebook, Citrix…

HP, IBM, Hortonworks, Cloudera, Comcast

Auto & General, Huawei, Pivotal, …

Talend, Twitter

Page 6: Open source for customer analytics

Benefits, Drawbacks & Facts

Benefits● No Licence Cost● Huge amount of

knowledge in the community

● High speed of innovation● Funny names

Drawbacks● Overwhelming choices● Varying maturity● Skills challenge (for

newer projects)

Facts of Life● Professional Services / Support not free

Page 7: Open source for customer analytics

“Data Products”

Core: valuable data. Tools to display and manipulate.

Good: live, visual, searchable

Types:

● Exploratory● Internal production● Publicly facing (but free)● Commercial = monetised

VOLUME

VARIETY

VELOCITY

VERACITY

Page 8: Open source for customer analytics

Popular Data Products

Google Flights (not a booking engine!)

CIA World Fact Book (simple presentation)

Inside AirBnB (“activist”)

data.gov.uk

Page 9: Open source for customer analytics
Page 10: Open source for customer analytics

The Data Process

1. Obtain data2. Explore & clean data3. Analyse & model4. Visualise5. Productionise & automate Data Pipeline

a. How and where to distribute?

b. How to scale?

c. How to secure?

d. How to manage day-to-day?

Page 11: Open source for customer analytics

Data Exploration on One PC

Page 12: Open source for customer analytics

Using ggplot2 for exploratory graphs

qplot(host$availability_365,+ geom="histogram",+ binwidth = 5, + main = "Histogram for Availability", + xlab = "AirBnB in London", + fill=I("blue"))

Page 13: Open source for customer analytics

Statistical Analysis

SIMPLE

● Sum, Count, Mean / Median

● Variance / Standard Deviation

E.g. Average Revenue per User per Neighbourhood (by Month of the Year)

MORE COMPLEX

● Clustering

● Co-variance matrix

(dependencies between

variables)

● Predictive Models

● Machine Learning

Page 14: Open source for customer analytics

Big Data Architectures (simplified)

“Big” Database Hadoop Cluster / File System

Query Engine (Data Access)

Execution Engine (Business Logic)

Search Engine (Accessibility)

Visualisation Layer

Page 15: Open source for customer analytics

Visualisation using KIBANA

Page 16: Open source for customer analytics

Trusted Analytics Platform - Brand New OSS

Page 17: Open source for customer analytics

Interactive Notebooks

New breed of software to work interactively on data

Spark/Scala Notebook

Apache Zeppelin

Databricks: cloud (proprietary but built on Spark)