understanding and presenting your findings

73
Understanding and presenting your findings.

Upload: gina

Post on 25-Feb-2016

30 views

Category:

Documents


2 download

DESCRIPTION

Understanding and presenting your findings . Present the basics to tell a story. Do not present advanced statistics and confusion. Today is about creating business intelligence. - PowerPoint PPT Presentation

TRANSCRIPT

Day 4

Understanding and presenting your findings.

Present the basics to tell a story. Do not present advanced statistics and confusion. Today is about creating business intelligence.

Solve problems you want a raise a promotion more funding for your group only one way long term add value. Create Business Intelligence.

The PresentationThe presentation is a very important part. Often it can be the most important part of a project.A good presentation should support the findings not just mention the findings.The supporting statistics, and graphs within the presentation can help people understand or confuse people.Management will often rely on the presentation to understand the findings from data mining.Management needs to trust the findings, if the findings are presented poorly, it is difficult to trust the findings.A poor presentation can even cause projects to fail. Management will not implement what they do not trust nor understand.Unfortunately, many statisticians and computer scientists are lacking in this critical area. They tend to merely look at the results and the numbers in the computer output. This makes many data analysis projects not as successful as they should be.The poor presentation, explanation often leaves management unclear on how to understand and proceed with the findings from the project.GLM ExampleT-Log Data: How can we use this information to understand about the different configuration? Comparing different types of checkout counter styles and cash registers using transaction log data (T-log) in terms of speed.Partial T-Log Data:Configuration of checkout counter. There are 4 types. 2 Different shapes and 2 different cash register types.nitems=number of items purchased during transactiontender=0 if cash is used,1 if credit is usedmassist=1 if manager assist 0 otherwisetimer1=time first item is scannedtimer2=time last item is scannedtimer3=time transaction completed

A snapshoot of the data. 7T-Log Data: It is necessary to create new variables for modeling, cannot use the data as is.We would want to do a general linear model to investigate speed in terms of configuration for the different shapes and register types.

An estimate for the time of a transaction could be a new variable equal to timer3-timer1.

What about configuration. Really we would desire to variables, one variable for the shape of the counter and another variable for the register type.

A snapshoot of the data. 8The General Linear Model (GLM). Do Not Show in presentation!

Note: This could be done better, but that is for another day.

From This we can see the approx 0.9 second difference per item by caused the two shapes

No difference for cash, all have a beta of about 10 seconds

We can see about a 5 second difference for the two register types for credit.

From This we can see the approx 0.9 second difference per item by caused the two shapesNo difference for cash, all have a beta of about 10 secondsWe can see about a 5 second difference for the two register types for credit.This could be done better, but that is for another day.Now how to present the resultsWhich Cashier Register and Counter Design Are Best?Comparing different types of checkout counter styles and cash registers using transaction log data (T-log)Main ObjectiveTo Understand The Differences Among The Checkout CountersThere Are 4 different configurationsTwo different shapes of countersTwo different types of cash registers

First the High Level Findings depends on your style, I like at end seen both waysCheckout counter shape and cash register type both have an impact on speed/time of transaction.These findings were statistically significant.

Using various statistical techniques, we found that configuration types 1 and 2 were best.

Although configuration type 2 on average was faster than type 1, we could not statistically prove that type 2 was faster than type 1 in general.There was a large difference in average time between Type 2 and the other types for manager assists but, we could not substantiate whether it was not just random chance.We found that Type 2 was faster than all other types including Type 1 when credit was used.Remember File Layout - very important!!! There are 4 types: 2 different shapes and 2 different cash register types.The Next Few Slides Will Highlight the Differences Among The Checkout Configurations

The Total Transaction Time: Final Time Minus the Time First Item Was ScannedOn Average Configuration 2 fastest and is 3 Seconds Faster per transaction than Configuration 1.Understanding Manager Assistance For The Cashier and the Configurations

A huge time difference when a manager has to assist a cashier due to issues with the cash register and a typical unassisted transaction.Understanding Manager Assists

Configuration 1 has the lowest percent of manager assists but after considering 5,000 transactions for each configurations, it is possible that the difference is between the configurations is mere random chance.Time is a Function of the Number of Items Purchased

As expected, a positive linear relationship.

There is Definitely A Difference Resulting From Shape and Cash Register TypeThe counter shape helps about 0.9 seconds per item.Shape 1Shape 2Time Per Item: Scan and BagAnother Way of Looking At The Time Per Item: Not nearly as nice as a graph, opinion.

The counter shape helps about 0.9 seconds per item.Shape 1Shape 2

There is Definitely A Difference Resulting From Shape and Cash Register TypeAverage Time To Make PaymentRegister Type 1: Configuration 1,3Register Type 2: Configuration 2,4Type 2 is better by approximately 5 seconds for when credit cards are used. Cash no difference.Looking At Total Transaction Time Without Manager Assists

Configurations 1 and 2 are best when looking at cash transactions. Configurations is the best overall when looking at credit cards only. Thus configuration 2 is best in terms of overall speed.Conclusions/RecommendationsFocus should be on reducing the need for manager assistance.A Major cause of time wasted is when a manager needs to assist the cashier. Approximately an additional 9 minutes spent.Configuration Types 1 and 2 perform the best.Given that Type 2 performs better than type 1 when credit is used and since we expect the use of credit to extend in Thailand, we would recommend Type 2.For a day with 2,000 transactions with an average savings of 3 seconds per transaction, the total savings time is 6,000 seconds or 100 minutes in man labor per day. For a day with 12,000 transactions it can lead to a savings time of 600 minutes or 10 hours in man labor per day. Note: a company looking to down size - eliminate cashiers this would be useful information.

Presenting a logistic regression modelThe PresentationA key to understanding is presentation. How do we view our results.Visualization and presentation is very important.It is important to know your audience.Your audience determines how you will present what you learn from the logistic regression model.Senior management in a business is not interested in a theoretical data mining discussion. S/he is interested in how your fraud detection model will help the company.A fellow statistician would need less visualization as they already understand, but in my opinion a nice presentation of results can only help.We will next cover how to look at the variables that enter into your model.This is very important for gaining trust in your work.

How Do We View the Independent Variables in the Model?It is important to interpret the variable in the model and then look at the variable individually compared to the dependent variable.Often the variable when viewed in the model might have the opposite relationship with the dependent variable than it does when looked at separately.This can result from multicollinearity.Multicollinearity will not be covered.Often when creating a model, it is good to think about the variables that enter into the model and why they are entered. You may be asked to explain why you choose to keep a certain variable and use it in the model.One way to investigate the independent variables relationship with the dependent variable is in the same way as when investigating the model.Sample Partial Presentation Of A Fraud Detection ModelIncluded is only an explanation of variables in the model and model validation.Most Important Factors For Detecting Fraud

Number Of Inquiries For Credit In The Past 6 Months

This slide is showing that people with more inquiries (applications) for credit are more likely to be a victim of fraud. Perhaps some of the inquiries for credit were made by someone attempting to commit fraud and not the actual individual.Number Of Inquiries For Credit In The Past 6 Months

This slide is showing the same information as the previous slide. This slide is more informative, but many people will think the previous slide is better and easier to understand.Know your audience (who you present to)!Percent Match and Mismatch Database On Driver License Number

People who are committing fraud are more likely to write a driver license number on the application different from the database you have. Percent Match and Mismatch Database On Zip Code

People who are committing fraud are more likely to write a zip code on the application different from the database you have. Average Age Of Applicant

Younger people are more often victims of fraud.Gender Of Applicant

Females are more often victims of fraud.Gender Of Applicant

Again there is more than one way to present the same thing.Know your audience (who you present to)!

An More GraphsThose simple graphs would be produced for all variables in the model.Understanding The Fraud Detection Model Performance

By refusing the bottom 10% of applicants you can reduce fraud by 32% (25,532/80,000)This Model has a KS of 25.82.By refusing the bottom 10% you would have 32 good loans to one fraud, before 24 good loans to one fraud.

Predicting Phone UsageWhy who to target for phone packagesTrue CustomersMany use other products but not the phoneThey have a phone just not with you.They are prospect phone customers.

You can leverage phone usagePredict phone usage for non phone users in your customer database.Highest predicted phone users get a promotion.

This can be done using your phone customers with multiple products.How to know if the model you create works that is the next few slides.Creating the modelMinutes used is continuous thus a general linear model.

The results of the model are predicted minutes.

Are the predictions any good?Must think how it will be applied.For our example think mail out campaign or phone sms campaign.

Validating the Model For MarketingAssume I Desire to Market To Groups 3 and 4.

The Categories:0-1000 minutes1001-1500 minutes1501-2000 minutes2001+ minutesA is correct mailings. ABCB is missed opportunity.C is mailing to less desirables.Note: We only missed mailing to 198 people with 2001+ minutes. 79.4% of true 3s and 4s correctly identified. Also, we would only mail to 67 people with less than 1001 minutes.Customer Profiling and Customer ValueSample Marketing Project students were to rank customers according to revenue and risk.Two Main ObjectivesTo Understand Your Customers

To Understand the Value of Your CustomersTo help make marketing strategies.First, Who Are Your CustomersWe Looked At All 15,045 Customers To Understand Who They AreGender: More Women than Men

Plan TypeApproximately 45% are in plan type 5. Plan Type 5 is the most popular plan type without a doubt.

Plan Type and Average Minutes UsedMinutes UsedPlan Type and Mean Minutes UsedOnly in plan type 5 do the customers use less than the minimum. In this plan type, customers give you free money. The total free money is $62,942 for December 2005, 0.2% of revenue, total=$34,282,000.

Plan Type and Mean Minutes Used

This slide is terrible! It has the same information as the last slide but without highlighting what the reader should learn from the slide. This is to show the importance of making your point clear in a presentation.

Present Payment StatusAlthough most people are paying you and are current at the present time, 3% are in default.

Worst Payment StatusDuring the past 12 months, a little more than 50% of your customers have been 30 days overdue in payment at least once.During the past 12 months, a little more than 18% of your customers have been 60 days overdue in payment at least once.Less than 25% have paid on time all the time, for the past 12 months.

Present and Worst Ever StatusMost customers are late at some point, but ultimately pay - approximately 66%.These people will most likely pay.You Should Do Many More Graphs Also, the graphs should be made much prettier.Ranking Your Customers According to Their Value to YouFirst we will discuss what is value and then we will discuss one way to rank your customers in terms of value.What is ValueValue is a combination of two things:ProfitThe more profitable the customer, the more valuable he or she is to your company. RiskUnfortunately, some of your customers are not paying you.Approximately 3%.This is lost money. People using their cell phone but not paying you are really a negative value, as they cost you money.For this reason, we feel risk is an important factor in understanding customer value.ProfitDue to the sensitive nature of profit margins, we could not use actual profit.

Thus, to understand your most profitable customers we used revenue.Revenue was calculated using plan type and minutes used. (Should give more details).

Revenue: We created Five Categories For Revenue

RiskTo understand Risk we looked at all 12 months of the payment history data.

Ultimately we decided to use the worst ever payment status for the past 12 months as a proxy for risk. The higher the value, the higher the risk.

Thus Risk has 5 categories, with values ranging from 0-4; the value 0 being the least risky and 4 being the most risky. A value of 4 actually means the person is in default.These are the Least RiskyEach Level Indicates an Additional Increased Level in Risk

RiskThis is a very basic concept of risk, should incorporate present status as well.Understanding Value as a Function of Risk and RevenuePercent of CustomersThis is the 4th, yellow, for caution.

These are your best customers, gold. They pay, low risk, and they use a lot of phone time. They make up 52.50% of your customers.These are your 2nd best customers, green. They pay, low risk, and they dont use much phone time. They make up 21.67% of your customers.These customers are in 3rd, grey. They are grey, since they are in a grey area, and are risky. This is the 5th, group; red for stop.Understanding Value as a Function of Risk and RevenuePercent of Revenue

Your 2nd best customers, green, make up approximately 14.84% of your revenue.Your best customers, gold, make up approximately 59.33% of your revenue.Your two top groups make up 74.17% of your total revenue.Value and Phone Usage

Value And Gender

RecommendationsYou want to keep your highest value customers happy.Consider creating an additional phone plan since more than 40% of your customers are in phone plan type 5.This recommendation is in part a result of finding customers that do not even use the minimum minutes for plan type 5.Most of your customers pay late at least once in 12 months.Consider charging late fees to increase revenue. Adding all of the12 months late for all of your customers, that are not in default equals approximately 27,478 late fee charges per year. If you charge a 10 baht per month late fee, you could make approximately an additional 274,780 baht per year.

Imagine a project costing 500,00 baht leading to an extra 274,780 baht/year in addition to doing the requested work. You have proven youre worth and more as a consultant. Statements such as the one above should be checked with the client before making them. Are they already using late fees?Simple is better in presentingKeep It SimpleDont present what you dont understand.

People want to understand what you are presenting.Do you buy products you dont understand just because the salesman says you should? No.

Leverage what you learn from the advanced statistics and create the supporting materials to convey the story Business Intelligence.

ThanksFor basic questions: [email protected]

Sheet1Relationship With FraudThe Variable+Number of inquiries for credit in the past 6 months-Driver License Number Match-Zip Code Match-Age of Applicant-GenderEtc.Etc.+Have a home loan-Have ever declared bankruptcy

Sheet1Credit ScoresTotal NumberCumulativeNumber ofNumber ofPercent ofPercent ofCumumalitiveCumumalitiveTheCumulativeCategoryof LoansPercentGood LoansFraudsGood LoansBad LoansPercent Good LoansPercent FraudsDifferenceOdds10200,00010%196,5963,40410.2%4.3%10.2%4.3%5.98%57.89200,00020%196,1703,83010.2%4.8%20.5%9.0%11.41%54.38200,00030%195,7454,25510.2%5.3%30.7%14.4%16.29%51.27200,00040%194,8945,10610.2%6.4%40.8%20.7%20.06%47.26200,00050%194,0435,95710.1%7.4%50.9%28.2%22.72%43.35200,00060%193,6176,38310.1%8.0%61.0%36.2%24.82%40.54200,00070%192,7667,23410.0%9.0%71.0%45.2%25.82%37.73200,00080%191,4898,51110.0%10.6%81.0%55.9%25.16%34.82200,00090%190,2139,7879.9%12.2%90.9%68.1%22.83%32.01200,000100%174,46825,5329.1%31.9%100.0%100.0%0.00%24.0Total2,000,0001,920,00080,000100.0%100.0%K-S statistic = Maximum Difference =25.82%