2002/4/10idsl seminar estimating business targets advisor: dr. hsu graduate: yung-chu lin data...

25
2002/4/10 IDSL seminar Estimating Business Targets Advisor: Dr. Hsu Graduate: Yung-Chu Lin Data Source: Datta et al., KDD01, pp. 420-425.

Upload: aubrie-lane

Post on 30-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

2002/4/10 IDSL seminar

Estimating Business Targets

Advisor: Dr. Hsu

Graduate: Yung-Chu Lin

Data Source: Datta et al., KDD01, pp. 420-425.

2002/4/10 IDSL seminar

Abstract

Propose a new solution to the classical econometric task of frontier analysis

Combine nearest neighbor methods and classical statistical methods

Identify under marketed customersBenchmark regional directory divisions

2002/4/10 IDSL seminar

Outline

MotivationObjectiveHistorical approachesTarget estimation methodologyCase studyConclusion Personal opinion

2002/4/10 IDSL seminar

Motivation

Setting targets is a critical taskSetting the target of each entity to the

average amongst the entities traditionallyTwo challenges

– The characteristics of the entities will have a heavy influence on the outcome

– The inherent unsupervised nature of the problem

2002/4/10 IDSL seminar

Objective

Provide a methodology for estimating unsupervised maximal or minimal targets

Setting revenue target expectations for individual customers

Revenue target setting for regional yellow page directories

2002/4/10 IDSL seminar

Historical Approaches

Mathematical programmingEconomics

2002/4/10 IDSL seminar

Mathematical Programming

where is the target for xi, a vector for the ith observation

Sensitivity to errors or outliers since it assumes that all observed targets define the possible space

)( ii xg

i

2002/4/10 IDSL seminar

Economics

where is a non-negative error term

The requirement of a model for the error term and for g

iii xg )(

i

2002/4/10 IDSL seminar

Target Estimation Methodology

Nearest neighbor vs. clusteringThe neighborhoodsThe distance functionTarget estimation from the neighborhoodsA heuristic for comparing neighborhoods

2002/4/10 IDSL seminar

Nearest Neighbor vs. Clustering

Time complexity– Clustering is better than nearest neighbor

Problem of clustering– Two similar entities fall into different cluster– Dimension higher, influence more serious– But nearest neighbor is not so

2002/4/10 IDSL seminar

The Neighborhoods

xi: ith observationyi: the variable containg its target valueni: neighborhood for xi, where ni is a set of

observations {xi, xj, …}

2002/4/10 IDSL seminar

The Distance Function

Continuous standardizee.g. Continuous- (2,1)(3,4)

Nominal- (a,b)(a,c) 2

22

1 )*3()*1( ww

220 w

2002/4/10 IDSL seminar

Target Estimation From the Neighborhoods

Let yi(1), yi(2), …, yi(k) be the order statistics, so that yi(1) is the largest

2002/4/10 IDSL seminar

A Heuristic for Comparing Neighborhoods

Maximal frontier E(xi) will range from 0 to 1Minimal frontier E(xi) >=1

2002/4/10 IDSL seminar

Case Study

Target revenues for directory book advertisers

Target revenue for regional directories

2002/4/10 IDSL seminar

(1) Target Revenues for Directory Book Advertisers

Goal– Find businesses that have low spending

relative to those with otherwise similar characteristics

Three categories of data available– Advertiser: e.g. number of employees– Directory: e.g. distribution size– Market : e.g. median household income

2002/4/10 IDSL seminar

Calculating Nearest Neighbors

Standardize continuous data: natural logK=4Weight the variables equally

– But decrease the weights for many of the directory and market variables

2002/4/10 IDSL seminar

Distribution for E(x) for Advertisers

2002/4/10 IDSL seminar

A Decision Tree to Predict phi -xi

2002/4/10 IDSL seminar

(2) Target Revenue for Regional Directories

Goal– Benchmark regional directory divisions

Separate the data into two sets– Training set: 80%– Test set: 20%

K=4

2002/4/10 IDSL seminar

Book Type

System book– an entire serving area

System-neighborhood book– A smaller number of geographic areas in the

franchise areaNeighborhood book

– Areas outside of the telephone company’s franchise area

2002/4/10 IDSL seminar

Four Different Distributions labeled according to the legend

2002/4/10 IDSL seminar

Neigborhood books System books Non-system books

The x-axis shos log(distribution) and the y-axis E(x)

2002/4/10 IDSL seminar

Conclusion

Present a general data mining methodology for estimating business targets by frontier analysis

First case– Increase sales focus on the under-marketed customers

– Increase the potential revenue by several million

Second case– Estimate optimal revenue performance targets for

directory divisions

– Increase for directory books is a minimum of several million dollars

2002/4/10 IDSL seminar

Personal opinion

Combine several existed methodologies or disciplines can make new powerful one