the startling truth: expert ad scoring software increase search engine advertising profitability
DESCRIPTION
"More Than 50% of Search-Based Ads are Not Effective" according to Forrester Research. If the majority of advertisers are doing A/B text ad testing -- why are more than half of paid search text ads ineffective? That answer is found in the following two questions: 1. What is the best methodology for determining the minimum number of impressions, clicks and conversions necessary to confidently evaluate the progress of an A/B ad test? 2. What are the total costs of A/B ad testing and why is it important to conclude tests as quickly as possible? Logic361's AdScoring software provides answers to these questions and shows how to significantly increase the effectiveness of your search engine advertising. Our analysis software was developed by Boston University and Carnegie Mellon University PhD statisticians and mathematicians with "hands-on" search marketing experience. Quickly concluding A/B ad tests and shifting impressions from under-performing ads to ads generating desired results will dramatically increase bottom-line results. We invite you to review our white paper "The Startling Truth: Search Advertising's A/B Ad Testing's Hidden Costs and Compelling Opportunities." Read about the results our clients have realized and how we developed the methodology to quickly and confidently assess A/B ad tests.TRANSCRIPT
The Startling Truth:
Expert Ad Scoring Software Increases Search Advertising Profitability
Page 2 The Startling Truth: Expert Ad Scoring Software Increases Search Engine Advertising Profitability
Logic361 (C) Copyright April 2010 All Rights Reserved
Google Adwords and Yahoo Sponsored Search Offer
Two Ways to Display Ads: Optimize or Rotate Optimization
When Google or Yahoo optimizes the display of two or more PPC ads, Google
determines which of the ads has the highest click-through rate (CTR). Over
time, they will show the ad with the higher CTR more often than the other ads in the ad group. Eventually, one of the ads will be shown 95% of the time due
to Google’s and Yahoo’s CTR optimization.
Ad Rotate
Google and Yahoo provide the option to specify that ads be shown in rotation.
For example, if an ad group has three ads in an ad group, they will show ad 1, then ad 2, then ad 3, and then begin the rotation again with ad 1. This option
evenly distributes impressions across all 3 ads (for the most part).
If your goal as an advertiser is to determine which ad
generates the greatest number of conversions, the highest
ROI and/or the most profit -- Ad Optimization is the wrong choice.
CTR optimization does not optimize based on conversions
or profitability.
According to Forrester Research:
“More Than 50% of Search-Based Ads are Not Effective.”
Why? than the other ads in the ad group. Eventually, one of the ads will be shown
95% of the time due to Google’s and Yahoo’s CTR optimization.
Ad Rotate
Google and Yahoo provide the option to specify
Page 3 The Startling Truth: Expert Ad Scoring Software Increases Search Engine Advertising Profitability
Logic361 (C) Copyright April 2010 All Rights Reserved
Forrester Research
Forrester Research, addressed ad
testing issues surrounding the best and worst of paid search in
2009, in a report of the same name. In the report, Forrester
judged the success or failure of
paid search campaigns based on a system of judging they called the
―Search Marketing Review‖ (hereafter ―the Review‖).
The Review was Forrester’s
objective means of reviewing 300 of the most relevant search terms
spread out among major industry verticals to diagnose various
―search program strengths, defects, and ways to improve
effectiveness.‖* The report identified ways that paid search
campaigns were falling short
throughout specific industry verticals.
According to the Review (which
included five common search terms across six different
industries and then a qualitative evaluation of the first 10 Google
AdWords ads that appeared), more than half of search-based
ads failed to be effective.
More than 50% of Search-Based Ads Are Not Effective: Why?
Forrester stated that the ads failed in three main categories:
1. Keywords—they were inefficiently used
or not used at all in the ad itself
2. Conversions—the ads failed to screen out irrelevant clickers or move them to
take action
3. Landing pages—visitors were taken to pages not relevant to their search, or
pages that offered ―not enough content or too much detail*‖ Forrester’s report can be purchased at:
http://www.forrester.com
Page 4 The Startling Truth: Expert Ad Scoring Software Increases Search Engine Advertising Profitability
Logic361 (C) Copyright April 2010 All Rights Reserved
Assessing A/B Ad Testing: Dollars & Sense
A study of Logic361’s 73 clients, 11
industries found the percentages of
non-value generating ads were
consistent with Forrester Research’s
findings. The following is a sample
of the data we reviewed (green
indicates a positive change from the
previous time period and red a
negative change)
Over the last 3 years, Logic361’s
software has analyzed over $100
million dollars in search based
advertising (19 billion impressions,
119 million clicks and 5.2 million
conversions.) Our findings include:
85% of our client’s ad groups
had 2 or more search ads in
the majority of their ads
groups
Search based ads were
displayed an average of 77
days regardless of the number
of impressions, clicks or
conversions.
34% of client’s ad groups had
ads that had been running for
110 days or more.
w
ww
.lo
gic
361.
co
m
Page 5 The Startling Truth: Expert Ad Scoring Software Increases Search Engine Advertising Profitability
Logic361 (C) Copyright April 2010 All Rights Reserved
If The Majority of Advertisers are Doing A/B
testing – Why are More Than Half of Search-
Based Ads Ineffective?
The answer to this important question can be found by
answering the following two
questions:
1. What is the best
methodology for determining
the minimum number of
impressions, clicks and
conversions necessary to
confidently evaluate the
progress of an A/B ad test?
2. What are the total costs of
A/B ad testing and why is it
important to conclude tests
as quickly as possible?
Over the past 9 years, technology
solutions have emerged to assist
search marketers with keyword
research, automated bidding and
account management. Yet, ad
testing continues to be measured
and managed using inconsistent
and contradictory industry ―rules of
thumb.‖
AdScoringtm Analysis Software:
Dramatically increases the
effectiveness of search based A/B ad testing and increases
profitability.
In plain terms, Logic361 thinks of
an ―advertising impression‖ as a coin toss with a low probability of
coming up heads (being clicked). When we compare search based
ads, we are comparing coins and asking: which one is most likely to
yield the most heads in the long term?
The simple approach is to pick the
coin that has the highest proportion of heads. We would like
to be able to say not just that one coin is better, but have some level
of confidence in our judgment.
The answer to dramatically
increasing the effectiveness of
search based advertising is
AdScoringTM.
w
ww
.lo
gic
361.
co
m
Page 6 The Startling Truth: Expert Ad Scoring Software Increases Search Engine Advertising Profitability
Logic361 (C) Copyright April 2010 All Rights Reserved
For example, in the following 30 day A/B ad test (actual results) which ad is
the most effective and why?
The answer is the Champion (A) ad based on the number of conversions and
cost per conversion.
What if you knew after 7 days that the Champion Ad (A) had a higher
statistical probability of being more effective than the Challenger Ad (B) –
would you conclude the test? The answer is ―yes‖ especially given the
significantly differences in effectiveness.
After 7 days, Logic361’s ad scoring algorithm determined that there was a
72% probability that the Champion Ad (A) would out-perform the Challenger
Ad (B) and by the 30th day the probability had increased to 93%.
What was the opportunity cost of not concluding the test at the end of 7
days? What would have been the impact of correspondingly shifting the
Challenger Ad impressions to the Champion Ad?
The Effectiveness of “Shifting” Impressions from “B” to “A”
Had the search marketer concluded the test after 7 days, the result would
have been a 25% increase in conversions (from 32 to 40) and a 31%
decrease in cost per conversion ($31.33 versus $45.38).
Impressions Clicks
Click
Rate Conversions
Conversion
Rate
Cost Per
Click Spend
Cost Per
Conversion
Challenger (B) 131,995 1,036 0.78% 11 1.06% 0.77$ 795$ 72.24$
Champion (A) 110,748 823 0.74% 21 2.55% 0.80$ 657$ 31.28$
Overall Results 242,743 1,859 32 1,452$ 45.36$
Impressions Clicks
Click
Rate Conversions
Conversion
Rate
Cost Per
Click Spend
Cost Per
Conversion
Shifted From Challenger (B) 101,636 752 0.74% 19 2.55% 0.80$ 602$ 31.37$
Champion (A) 110,748 823 0.74% 21 2.55% 0.80$ 657$ 31.28$
Overall Results 212,384 1,575 40 1,259$ 31.33$
Previous Overall Results 242,743 1,859 32 1,452$ 45.38$
Net Conversion Gain 8 Net CPC Decrease (14.05)$
w
ww
.lo
gic
361.
co
m
Page 7 The Startling Truth: Expert Ad Scoring Software Increases Search Engine Advertising Profitability
Logic361 (C) Copyright April 2010 All Rights Reserved
The Hidden Cost of Not Concluding A/B Ad
Tests Quickly
The financial impact of making a decision sooner would have been
an 85% increase in contribution margin.
Customer Case Study
When Logic361’s AdScoringTM is applied to an entire search advertising
account dramatic financial results can be achieved.
Utilizing the Logic361’s software our professional services consultants
analyzed 4,000 ad groups for an international retailer with 210 online stores.
Our software identified the one or two best performing ads per ad group
(based on revenue and client profit targets) and modeled the impact of
pausing 37% of the low or non-performing ad inventory and shifting over 3
million under-monetized monthly impressions.
With Without
AdScoring AdScoring
(40 Orders) (32 Orders)
Revenue 9,880$ 7,904$
(Average Order $247.00)
Cost of Goods Sold (70%) 6,916 5,533
Search Advertising Cost 1,259 1,452
Contribution Margin 1,705$ 919$
Contribution Margin Increase (85%) 786$
w
ww
.lo
gic
361.
co
m
Page 8 The Startling Truth: Expert Ad Scoring Software Increases Search Engine Advertising Profitability
Logic361 (C) Copyright April 2010 All Rights Reserved
The actual results achieved for this client were:
26% increase (536) in monthly conversions (from 2,063 to 2,599)
37% decrease in average cost per conversion (From $35.54 to $22.39)
which equated to a monthly advertising cost savings of $34,176 per
month (based on the increased conversions.)
The third week following the pausing of the low and non-performing ad
inventory the client realized the highest record week of sales in the history
of company.
Summary
Why Should You Use Ad Scoring to Improve A/B Ad Testing?
Writing and determining effective ad copy is arguably the most important
responsibility a search marketer has. The effectiveness of search ads is the
keystone of search based advertising ROI. Given its importance, it’s
surprising that statistical ad scoring is not a standard practice on par with
bid management.
Logic361 is the first company (to our knowledge) to develop a search based
ad scoring solution capable of systematically analyzing hundreds of
thousands of simultaneous A/B ad tests. The results that have been achieved
clearly demonstrate that assessing ad inventory, scoring A/B ad tests and
re-distributing under monetized impressions can generate dramatic bottom
line results.
We developed our ad scoring solution with two goals in mind. First, we
wanted to be 100% confident in our scoring methodology. Second, we
wanted the scoring algorithm/methodology to be precise but not so complex
that it was a ―black box‖ that could not be easily understood and explained.
The addendum provides the opportunity to review our methodology. We
welcome your questions and comments.
w
ww
.lo
gic
361.
co
m
Page 9 The Startling Truth: Expert Ad Scoring Software Increases Search Engine Advertising Profitability
Logic361 (C) Copyright April 2010 All Rights Reserved
About Logic361
Logic361 AdScoringTM software speeds decision-making and empowers search engine advertisers with the unprecedented capability to quickly
assess, prioritize and respond to changes in paid search advertising performance. The Company’s data driven, scientific approach combines automated analyses, results-orientated prioritization and advanced decision-making methodologies -- within a single, powerful application.
For information, or to schedule an analysis of your ad inventory, contact: Stephen Schramke [email protected] (206) 842-0747
Copyright © 2010 Logic361 Corporation. Logic361TM, the Logic361 logo, and AdScoringTM are
trademarks of Logic361 Corporation that may be registered in some jurisdictions. All other company and product names are the property of their respective owners.
All rights reserved worldwide. No part of this publication may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any human or computer language in any form or by any means without the express written permission of:
Logic361 Corporation 93 S Jackson Street, Suite 22340 Seattle, WA 98104
This publication is provided as is without warranty of any kind, express or implied, including, but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-
infringement. This publication could include technical inaccuracies or typographical errors. Changes are periodically
added to the information herein. These changes will be incorporated in new editions of the publication. Logic361 Corporation may make improvements and/or changes at any time to the product(s) and/or the programs(s) described in this publication.
All terms mentioned in this publication that are known to be trademarks or service marks have been appropriately capitalized. Use of a term in this publication should not be regarded as affecting the validity of any trademark or service mark.
w
ww
.lo
gic
361.
co
m
Page 10 The Startling Truth: Expert Ad Scoring Software Increases Search Engine Advertising Profitability
Logic361 (C) Copyright April 2010 All Rights Reserved
Addendum How Logic361 Developed a Solution for Assessing Search Based A/B Ad Tests
Logic361’s software engineering team partnered with the Dataspora company to develop a programmable statistical methodology for accessing paid search ad performance.
The Dataspora team was led by Michael E. Driscoll, who has a decade of experience developing large-scale databases and data mining algorithms within industry, government, and academic institutions. Michael has a Ph.D. in Bioinformatics from Boston University and an A.B. from Harvard University.
Michael was assisted by John Mount, Ph.D.. John is an expert in web-scale algorithms and statistics. His interests include optimization and recent positions include directing research at online retailer Shopping.com. John
has a Ph.D. in Computer Science from Carnegie Mellon University and an A.B. in Mathematics from U.C. Berkeley.
w
ww
.lo
gic
361.
co
m
Page 11 The Startling Truth: Expert Ad Scoring Software Increases Search Engine Advertising Profitability
Logic361 (C) Copyright April 2010 All Rights Reserved
Logic361’s Statistical Methodology
For Assessing Ad Performance
Comparing the Click Rate Performance of Two Ads When we compare the click rate performance of two ads, we are comparing
two binomial random distributions and asking: which one is better? The graphic below shows two curves: (i) a champion ad in blue with a 5% click rate based on 10,000 impressions, and (ii) a challenger ad in green with a 7% click rate based on 100 impressions. We know the challenger ad is performing better on a click rate basis, but because we have only 100 data points, we can’t say for certain that this is not a result of chance.
What we’d like to do is have a measure that could tell us the likelihood that our challenger ad click rate would end-up lower than
the champion ad by chance; if we took samples from the blue and green distributions, how often would green
be higher? Unfortunately, calculating this for two
binomial distributions is a non‐trivial task. Fortunately, calculating this for two normal distributions is
easier, and more relevantly, can be implemented programmatically in a straightforward way. We start by approximating the binomial distribution to a normal distribution
with mean and variance given by:
w
ww
.lo
gic
361.
co
m
Page 12 The Startling Truth: Expert Ad Scoring Software Increases Search Engine Advertising Profitability
Logic361 (C) Copyright April 2010 All Rights Reserved
Where N is the sample size (or in this case, impressions), and p is the
success rate (click rate).
As seen below, when the number of impressions N is large this
approximation is excellent (blue is binomial, red is normal) but not so when
N is much smaller than 100.
Calculating the difference between two normal distributions X and Y yields a
normal distribution as a result, with mean and variance given by:
We can express this in terms of two sets of binomial parameters, N and p, as
follows:
We now have a probability distribution that describes, given the sample sizes
and click rates of two ads, the likelihood of seeing a given margin of
difference by chance.
w
ww
.lo
gic
361.
co
m
Page 13 The Startling Truth: Expert Ad Scoring Software Increases Search Engine Advertising Profitability
Logic361 (C) Copyright April 2010 All Rights Reserved
For our basic question – how confident are we of the observed difference between challenger and champion ad – we can calculate a p value by way of the z statistic.
The z statistic is a normalized measurement of deviation from the mean: for a given value in a distribution, it’s the value’s distance from the mean divided by the standard deviation. Thus z scores have a zero mean and a standard deviation of 1: the classic normal distribution. This has relevance
because programmatically, once we have z scores, we can easily convert them into p values: percentages that say ―there’s 95% chance the champion or challenger ad is better.‖ z scores are calculated as:
Where the mean variance is defined as:
We can convert z scores into p values using the cumulative density function
for the normal. Given a value, it returns the quartile.
When 2 Ads Have a Small Number of Clicks: Fisher’s Exact Test
When our sample sizes are small (as is more often the case with clicks
rather than impressions), we calculate our confidence metric (that a
challenger is outperforming a champion or vice versa) using Fisher’s Exact
Test. We use Fisher’s Exact Test when the sum of observations is less than
20 (this becomes computationally infeasible on a standard server platform to
perform for N > 20).
w
ww
.lo
gic
361.
co
m
Page 14 The Startling Truth: Expert Ad Scoring Software Increases Search Engine Advertising Profitability
Logic361 (C) Copyright April 2010 All Rights Reserved
The Fisher Exact test relies on exhaustively calculating all possible
outcomes, and identifying those that match or exceed our observed difference between challenger versus champion impressions and clicks. This
fraction represents the probability that our difference could have occurred by chance (one minus this fraction is thus our confidence or p‐value).
First we define standard five Fisher quantities (Fischer’s test is usually
applied to 2 x 2 tables) as:
We can then calculate our p value as follows:
In the figure below, we show the implicit distributions for the click rates on
two ads, with 1.88% CTR and 2.53% CTR, respectively. Based on these
underlying distributions, we have 35% confidence that the higher ad (in
red), will outperform the lower (in blue) going forward.
w
ww
.lo
gic
361.
co
m
Page 15 The Startling Truth: Expert Ad Scoring Software Increases Search Engine Advertising Profitability
Logic361 (C) Copyright April 2010 All Rights Reserved
This diagram illustrates our approach visually: that we can estimate our
confidence value by looking at the amount of overlap of our distributions.
The more overlap, the less confident we are that the higher one is higher.
We can quantify this value exactly as:
Where the functions are probability distribution functions, we know that the
red area sums to 1, so it reduces to simply 1- area(intersection). In our
example, about 65% of the red area is in the overlap, thus only 35% is
strictly greater.
In general, we can calculate the area of overlap between two random,
unimodal probability distribution functions X and Y via integration, as follows
The following are additional examples:
w
ww
.lo
gic
361.
co
m
Page 16 The Startling Truth: Expert Ad Scoring Software Increases Search Engine Advertising Profitability
Logic361 (C) Copyright April 2010 All Rights Reserved
Calculating the Confidence Metric for the Cost Distributions
We have observable data for impressions, clicks, and acquisitions. Average
cost per click is a constant (for a given ad). Given our observable data, we
first (i) infer the implicit distributions from which these values are drawn
(sometimes called the posterior distribution), and (ii) compare two
distributions and state a level of confidence that, for future observations,
one will remain higher than another.
We then extend our approach of analyzing probability functions to estimating
differences between other metrics, such as cost-per-acquisition, cpa. The
cpa is a function that depends on a random variable for acquisitions, a.
Given a pair of cpa measures for two ads, call them cpaX and cpaY we can
generate distribution functions which are conditioned on a, the number of
acquisitions.
Where cpa(a) is a function defined (from our table) as:
w
ww
.lo
gic
361.
co
m
Page 17 The Startling Truth: Expert Ad Scoring Software Increases Search Engine Advertising Profitability
Logic361 (C) Copyright April 2010 All Rights Reserved
And because cpc and c are constants, we only integrate over all the possible
values of a. We perform the same analysis for spend to value, plugging in
the values from our table, and integrating over all possible values of c while
holding other variables constant.
Variable Symbol Derivation Example Posterior Probability Distribution
Impressions 100,00
0
Clicks 1,100 binom (i,ctr)
Acquisitions 22 binom (c,ar)
Click-thru
rate
1.1% beta(i-c,c)
Acquisition
rate
2.0% beta(c-a,a)
Cost per
click
$2.00
Cost per
acquisition
$100
Total spend $2200
Total value $3300
Spend to
value
66.0%
w
ww
.lo
gic
361.
co
m