ch 7. what makes a great analysis? taming the big data tidal wave 31 may 2012 snu idb lab. sengyu...
TRANSCRIPT
Ch 7. What Makes a Great Analysis?
Taming The Big Data Tidal Wave
31 May 2012SNU IDB Lab.
Sengyu Rim
2
Outline Criterions for a Good Analysis Frame the Problem Correctly Making Inferences
3
Criterions for a Good Analysis(1/7)
What is Reporting? Reporting isn’t equal to analysis
– Many organizations mistakenly equate reporting with analysis A reporting environment(business intelligence environment)
– Select the reports they want to run – Get the reports executed – View the results
report
Provide data
Prede-fined form
Inflexi-ble
4
Criterions for a Good Analysis(2/7)
What is Analysis? An analysis is an interactive process of
– Tackling problem– Finding the data required – Analyze the data – Interpret the results
Analysis
Provide answer
Custom-izedFlexible
5
Criterions for a Good Analysis(3/7)
Comparison between Reporting and Analysis Summary of Analysis versus Reporting
Reporting Analysis
Provides data Provides answers
Provides what is asked for
Provides what is needed
Is typically standardized Is typically customized
Does not involve a per-son
Involves a person
Is fairy inflexible Is extremely flexible
6
Criterions for a Good Analysis(4/7)
G.R.E.A.T criteria G.R.E.A.T criteria will add value to analysis
G
R
E
A
T
Guided-guided by a business need
Relevant-relevant to the business
Explainable-analysis needs to be explained effectively
Actionable-a great analysis will be action-able
Timely-analysis will be delivered in a timely fashion
7
Criterions for a Good Analysis(5/7)
What are Core Analytics? Core analytics tend to ask simple questions and provide simple an-
swers– What happened – When it happened– What the impact was
Sales Promotion
1.How many sub-scribers signed up?2.How did the sing-ups occur everyday?3.How much money did the new subscribers bring in?
8
Criterions for a Good Analysis(6/7)
What are Advanced Analytics? Advanced analytics go further than core analytics
– What caused it to happen– What can be done in the future
Customer Web Ac-tivity
1. Identify the re-lationship be-tween browsing and sales
2. Formulate strategy for marketing
9
Criterions for a Good Analysis(7/7)
Cherry Picking Sometimes the gut feelings of executives conflict with analysis One of the worst abuses is to cherry pick results Cherry picking
– Use the analytics when the results serve your purpose– Ignore the findings when the results conflict with the original plan
10
Outline Criterions for a Good Analysis Frame the Problem Correctly Making Inferences
11
Frame the Problem Correctly(1/6)
How to Frame the Problem? Great analysis starts with framing the problem correctly
– Assess the data correctly– Develop a solid analysis plan– Technical and practical considerations should be taken into account
Framing the problem is the most important step of an analysis
12
Frame the Problem Correctly(2/6)
Statistical Significance
Statistical significance– Used to evaluate the parameter estimates
A statistical significance will validate the conclusions
13
Frame the Problem Correctly(3/6)
Never Take Shortcuts Ensure you have all the data you need
– Given the part of the story, conclusions may be completely wrong Who has the higher average batting?
Season Tom Joe Winner
1 .252 .255 Joe
2 .259 .266 Joe
3 .237 .241 Joe
4 .253 .255 Joe
5 .256 .257 Joe
Year Tom Avg
TomAt Bats
Tom Hits
JoeAvg
JoeAt Bats
JoeHits
win-ner
1 .252 123 31 .255 341 87 Joe
2 .259 355 92 .266 109 29 Joe
3 .237 139 33 .241 377 91 Joe
4 .253 304 77 .255 294 75 Joe
5 .256 363 93 .257 206 53 Joe
To-tal
.254 1284
326 .252 1327
335 Tom
14
Frame the Problem Correctly(4/6)
Business Importance Statistical significance should match the business perspective
– What are the costs to make the recommended changes?– How much additional revenue might be generated?– Is the new approach consistent with the overall strategy?– Are the new changes executable?
Statistical significance
Business importance
15
Frame the Problem Correctly(5/6)
Samples versus Population Using today’s scalable systems, it’s possible to work with an entire
population– With big data, we have enough data for a sufficient sample
When a sampling process is needed, it needs to be done correctly– The bigger sample is made, the tighter the margin of the error– Sample size should be suitable for the problem
16
Frame the Problem Correctly(6/6)
All Data is Needed Any given problem may require only a small sample of the data Different samples require different data
– Entire data should be kept
s1
s2
s4
s7
s8
s6
s5
s3
s9
s10
17
Outline Criterions for a Good Analysis Frame the Problem Correctly Making Inferences
18
Making Inferences
To produce a great analysis, it is necessary to infer potential ac-tions– Make initial inferences based on analysis