online advertising and large scale model fitting
TRANSCRIPT
Online Advertising and Large-scale model fitting
Wush Wu2014-10-24
Outline
Introduction of Online Advertising
Handling Real DataData Engineering
Model Matrix
Enhance Computation Speed of R
Fitting Model to Large Scale DataBatch Algorithm Parallelizing Existed Algorithm
Online Algorithm SGD, FTPRL and Learning Rate Schema
Display Advertising Challenge
Ad Formats Pre Roll Video Ads
Ad Formats Banner/ Display Ads
Adwords Search Ads
Related Content Ads
Online Advertising is Growing Rapidly
Why Online Advertising is Growing?
Wide reach
Target oriented
Quick conversion
Highly informative
Cost-effective
Easy to use
Measurable
Half the money I spend on advertising is wasted; the trouble is I don't know which half.
How do we measure the online ad?
The user behavior on the internet is trackable.We know who watches the ad.
We know who buys the product.
We collect data for measurement.
How do we collect the data?
Performance-based advertising
Pricing ModelCost-Per-Mille (CPM)
Cost-Per-Click (CPC)
Cost-Per-Action (CPA) or Cost-Per-Order (CPO)
To Improve Profit
Display the ad with high Click-Through Rate(CTR) * CPC, or Conversion Rate (CVR) * CPO
Estimation of the probability of click (conversion) is the central problem Rule Based
Statistical Modeling (Machine Learning)
System
WebsiteAd RequestRecommendationAd Delivering
WebsiteLog Server
Model Fitting
Batch
Online
Rule Based
Let the advertiser selects the target group
X
Statistical Modeling
We log the display and collect the response
FeaturesAd
Channel
User
Features of Ad
Ad typeText
Figure
Video
Ad ContentFashion
Health
Game
Features of Channel
Visibility
Features of User
Sex
Age
Location
Behavior
Real Features
Zhang, Weinan and Yuan, Shuai and Wang, Jun and Shen, Xuehua. Real-Time Bidding Benchmarking with iPinYou Dataset
Know How v.s. Know Why
We usually do not study the reason of high CTR
Little improvement of accuracy implies large improvement of profit
Predictive Analysis
Data
SchoolStatic
Cleaned
Public
CommercialDynamic
Error
Private
Data Engineering
Impression
CLICK_TIMECLIENT_IPCLICKED ADID
2014/05/17 ...2.17.x.x133594
2014/05/17 ...140.112.x.x134811
Click
+
Data Engineering with R
http://wush978.github.io/REngineering/
Automation of R JobsConvert R script to command line application
Learn modern tools such as jenkins
Connections between multiple machineLearn ssh
LoggingLinux tools: bash redirection, tee
R package: logging
R Error Handlingtry, tryCatch
Characteristic of Data
Rare Event
Large Amount of Categorical FeaturesBinning Numerical Features
Features are highly correlated
Some features occurs frequently, some occurs rarely
Common Statistical Model for CTR
Logistic Regression
Gradient Boosted Regression TreeCheck xgboost
Logistic Regression
Linear relationship with featuresFast prediction
(Relative) Fast Fitting
Usually fit the model with L2 regularization
How large is the data?
Instances: 10^9
Binary features: 10^5
Subsampling
Sampling is useful for:Data exploration
Code testing
Sampling might harm the accuracy (profit)Rare event
Some features occurs frequently and some occurs rarely
We do not subsample data so far
Sampling
Olivier Chapelle, et. al. Simple and scalable response prediction for display advertising.
Computation
Model Matrix
head(model.matrix(Species ~ ., iris))
Dense Matrix
10^9 instances
10^5 binary features
10^14 elements for model matrix
Size: 4 * 10^14 bytes400 TB
In memory is about 10^3 faster than on disk
R and Large Scale Data
R cannot handle large scale data
R consumes lots of memory
Sparse Matrix
Sparse Matrix
Sparse Matrix
The size of non-zero could be estimated by the number of categorical variable
Sparse Matrix
Sparse matrix is useful for:Large amount of categorical data
Text Analysis
Tag Analysis
R package: Matrix
m1