mr. d. g. sancheti · increase revenue decrese costs increse productivity why big data analytics??...

48
Presented by: Mr. D. G. Sancheti

Upload: others

Post on 18-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Presented by:

Mr. D. G. Sancheti

Page 2: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

“We are drowning in data, but starving for knowledge!”

Page 3: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

TODAY‘S SHOW

Page 4: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

You will learn a few data analysis topics

Posing a question

Wrangling your data into a format you can use and fixing

any problems with it

Exploring the data, finding patterns in it, and building

your intuition about it

Drawing conclusions and/or making predictions

Communicating your findings

Page 5: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

What is Big Data Analytics?

Data analytics is an emerging technique that dives into a

data set without prior set of hypotheses

Accumulation of raw data captured from various sources

(i.e. discussion boards, emails, exam logs, chat logs in e-

learning systems) can be used to identify fruitful

patterns and relationships

Examining large amount of data

Page 6: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Data Drives

Performance

Big Data Analytics Drives

result

Increase Revenue

Decrese Costs

Increse Productivity

Why Big Data Analytics??

Page 7: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Why Big Data Analytics??

Page 8: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Applications of Data analytics

Understanding and targetting Customers

Understanding and optimizing Business Processes

Improving Healthcare and Public Health

Optimizing Machine and Device Performance

Financial Trading

Improving and Optimizing Cities and Countries

Can you think of anything more??

How??

Page 9: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Reference Models

CRISP-DM

Agile methodology: ASD-DM

Page 10: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics
Page 11: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Cross Industry Standard Process for Data Mining

(CRISP-DM)

The CRISP-DM reference model

Page 12: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Cross Industry Standard Process for Data Mining

(CRISP-DM)

The CRISP-DM reference model

Page 13: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Cross Industry Standard Process for Data Mining

(CRISP-DM)

The CRISP-DM reference model

Page 14: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Cross Industry Standard Process for Data Mining

(CRISP-DM)

The CRISP-DM reference model

Page 15: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Cross Industry Standard Process for Data Mining

(CRISP-DM)

The CRISP-DM reference model

Page 16: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Cross Industry Standard Process for Data Mining

(CRISP-DM)

The CRISP-DM reference model

Page 17: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

The BIG Four

Classification Cluster Analysis

Association Rules Prediction

Page 18: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Data Classification

Some Examples:

Separating Customer based on gender

Data sorting based on content type/file type,size etc

Classifying data into restricted, pubic or private data

types

"Among all the customers of Zalando, which are likely to respond to a new

offer?"

Will respond Will not respond

Page 19: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Decision trees (DT)

Build classification or regression models in the form of Tree

structure

Classification Methods

Page 20: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Classification Methods

Decision Trees to Decision Rules

Page 21: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Classification Methods

Support Vector Machines(SVM)

Each data item is a point in n-dimensional space(n number

of features)

Find the hyperplane that differentiate the two classes

Page 22: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Classification Methods

Which do you think are the separating

Hyperplanes?

Page 23: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Classification Methods

Select the hyperplane which

segragates two classes better

Ans: B

Maximising the distance between

nearest data point (Margin)

Ans: C

Select hyper-plane which classifies

accurately prior to maximising margin

Ans: A

Ignores outliers

Introduce: Z=x²+y²

In original input space

hyperplane looks like a circle

Page 24: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Classification Methods

Dotted lines: Potential Links

Blue box: Additional nodes and links between input

and output

Bayesian Networks

Based on probability theory.

Can mix expert opinion and data to build

models

Backwards reasoning - in addition to

predicting outputs given inputs, we can

use output values to infer inputs.

Support for missing data during learning

and classification

Page 25: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Classification Methods

Bayesian Network Example

Page 26: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Association Rules

Discovering interesting realtions between variables in

large DB

Example Problems

Which products are frequently bought together by

customers? (Basket Analysis)

● DataTable = Receipts x Products

● Results could be used to change the placements of products in the market

Which courses tend to be attended together?

● DataTable = Students x Courses

● Results could be used to avoid scheduling conflicts....

Page 27: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Association Rules

Examples

Bread, Cheese → Red Wine.

Customers that buy bread and cheese, also tend to buy red

wine

Machine Learning → Web Mining, ML Praktikum

Students that take 'Machine Learning' also take 'Web Mining'

and the 'Machine Learning Praktikum'

Page 28: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Apriori Principle illustration

If {c,d,e} is frequent then all

subssets of this itemset are

frequent

Support Based pruning illustration

If {a,b} is infrequent then all

supersets of this itemset are

infrequent

Association Rules

Page 29: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Association Rules: Apriori example

Page 30: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Cluster analysis

Task of grouping a set of objects in such a way that

objects in the same group (called a cluster) are more

similar (in some sense or another) to each other than to

those in other groups (clusters).

Examples

Biology: What is the taxonomy of the species?

Education: What are student groups that need special

attention?

Business: What are the customer segments?

Page 31: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Clustering workflow

Page 32: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Cluster analysis

Methodologies

K-Means Clustering

Hierarchical Clustering

And many more!!

Page 33: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

K-means clustering

k-means clustering aims to partition n observations into k

clusters in which each observation belongs to the cluster

with the nearest mean, serving as a prototype of the

cluster

Unsupervised learning algorithm

Define k centroids, one for each cluster

Take each point in the data set and associate it to the

nearest centroid

Recalculate the centroids

Repeat until the centroid doesnt move

Page 34: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Hierarchical clustering

Groups data over a variety of scales by creating a cluster

tree or dendrogram.

Find the similarity or dissimilarity between every pair of

objects in the data set.

Group the objects into a binary, hierarchical cluster

tree.

Determine where to cut the hierarchical tree into

clusters

Page 35: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Hierarchical clustering

Dissimilarity

measures

Grouped (B,F), less

dissimilarity

Grouped (A,E), less

dissimilarity

Page 36: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Hierarchical clustering

Page 37: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Hierarchical clustering

Cutting the Tree

50% similarity=50% dissimilarity

Take cluster samples below 0.5 dissimilarity

(B,F),(A,E,C,G),(D)

Creating 3 cluster labelled 1,2,3

Page 38: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Clustering workflow

Which algorithm fits my data?

Which parameters fit my data?

How good is the obtained result?

How to improve result quality?

Page 39: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Predictive Analytics

Make predictions about unknown future events based on

past happenings

Why now?

Growing volumes and types of data, and more interest in

using data to produce valuable insights.

Faster, cheaper computers.

Easier-to-use software.

Tougher economic conditions and a need for competitive

differentiation.

Page 40: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Predictive Analytics

improve pattern detection and prevent criminal

behavior.

determine customer responses or purchases, as well as

promote cross-sell opportunities

forecast inventory and manage resources, to set ticket

prices.

Credit scores are used to assess a buyer’s likelihood of

default for purchases

Page 41: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Data Visualization

Data visualization is the process of converting raw data

into easily understood pictures of information that

enable fast and effective decisions.

Visualization plays the key role in the efficient

communication of information (especially with large

amounts of information).

Visualization is used as a "check" to verify / falsify

results of automatic data analysis.

Page 42: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Why Data Visualization?

Identify areas that need attention or improvement.

Clarify which factors influence customer behavior.

Help you understand which products to place where.

Predict sales volumes.

Data visualization is a quick, easy way to convey concepts in a

universal manner

Page 43: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Where does Visualization fit in CRISP-DM

Visual

Reportting

Page 44: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Visual Analytics Loop

Visual Analytics will foster the constructive evaluation, correction and rapid

improvement of our processes and models and - ultimately - the improvement of our

knowledge and our decisions

Page 45: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Visual Analytics : Humane and Machine

Page 46: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Visual Analytics vs Information Visualization

Visual analytics is more than just visualization. It can rather be seen as an

integral approach to decision-making, combining visualization, human

factors and data analysis.

Page 47: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics

Various Data Visualization Techniques

Page 48: Mr. D. G. Sancheti · Increase Revenue Decrese Costs Increse Productivity Why Big Data Analytics?? ... How good is the obtained result? How to improve result quality? Predictive Analytics