the startling truth: expert ad scoring software increase search engine advertising profitability

The Startling Truth:

Expert Ad Scoring Software Increases Search Advertising Profitability

The Startling Truth: Expert Ad Scoring Software Increases Search Engine Advertising Profitability

Logic361 (C) Copyright April 2010 All Rights Reserved

Google Adwords and Yahoo Sponsored Search Offer

Two Ways to Display Ads: Optimize or Rotate Optimization

When Google or Yahoo optimizes the display of two or more PPC ads, Google

determines which of the ads has the highest click-through rate (CTR). Over

time, they will show the ad with the higher CTR more often than the other ads in the ad group. Eventually, one of the ads will be shown 95% of the time due

to Google’s and Yahoo’s CTR optimization.

Ad Rotate

Google and Yahoo provide the option to specify that ads be shown in rotation.

For example, if an ad group has three ads in an ad group, they will show ad 1, then ad 2, then ad 3, and then begin the rotation again with ad 1. This option

evenly distributes impressions across all 3 ads (for the most part).

If your goal as an advertiser is to determine which ad

generates the greatest number of conversions, the highest

ROI and/or the most profit -- Ad Optimization is the wrong choice.

CTR optimization does not optimize based on conversions

or profitability.

According to Forrester Research:

“More Than 50% of Search-Based Ads are Not Effective.”

Why? than the other ads in the ad group. Eventually, one of the ads will be shown

95% of the time due to Google’s and Yahoo’s CTR optimization.

Ad Rotate

Google and Yahoo provide the option to specify



Forrester Research

Forrester Research, addressed ad

testing issues surrounding the best and worst of paid search in

2009, in a report of the same name. In the report, Forrester

judged the success or failure of

paid search campaigns based on a system of judging they called the

―Search Marketing Review‖ (hereafter ―the Review‖).

The Review was Forrester’s

objective means of reviewing 300 of the most relevant search terms

spread out among major industry verticals to diagnose various

―search program strengths, defects, and ways to improve

effectiveness.‖* The report identified ways that paid search

campaigns were falling short

throughout specific industry verticals.

According to the Review (which

included five common search terms across six different

industries and then a qualitative evaluation of the first 10 Google

AdWords ads that appeared), more than half of search-based

ads failed to be effective.

More than 50% of Search-Based Ads Are Not Effective: Why?

Forrester stated that the ads failed in three main categories:

1. Keywords—they were inefficiently used

or not used at all in the ad itself

2. Conversions—the ads failed to screen out irrelevant clickers or move them to

take action

3. Landing pages—visitors were taken to pages not relevant to their search, or

pages that offered ―not enough content or too much detail*‖ Forrester’s report can be purchased at:

http://www.forrester.com



Assessing A/B Ad Testing: Dollars & Sense

A study of Logic361’s 73 clients, 11

industries found the percentages of

non-value generating ads were

consistent with Forrester Research’s

findings. The following is a sample

of the data we reviewed (green

indicates a positive change from the

previous time period and red a

negative change)

Over the last 3 years, Logic361’s

software has analyzed over $100

million dollars in search based

advertising (19 billion impressions,

119 million clicks and 5.2 million

conversions.) Our findings include:

85% of our client’s ad groups

had 2 or more search ads in

the majority of their ads

groups

Search based ads were

displayed an average of 77

days regardless of the number

of impressions, clicks or

conversions.

34% of client’s ad groups had

ads that had been running for

110 days or more.

w

ww

.lo

gic

361.

co

m



If The Majority of Advertisers are Doing A/B

testing – Why are More Than Half of Search-

Based Ads Ineffective?

The answer to this important question can be found by

answering the following two

questions:

1. What is the best

methodology for determining

the minimum number of

impressions, clicks and

conversions necessary to

confidently evaluate the

progress of an A/B ad test?

2. What are the total costs of

A/B ad testing and why is it

important to conclude tests

as quickly as possible?

Over the past 9 years, technology

solutions have emerged to assist

search marketers with keyword

research, automated bidding and

account management. Yet, ad

testing continues to be measured

and managed using inconsistent

and contradictory industry ―rules of

thumb.‖

AdScoringtm Analysis Software:

Dramatically increases the

effectiveness of search based A/B ad testing and increases

profitability.

In plain terms, Logic361 thinks of

an ―advertising impression‖ as a coin toss with a low probability of

coming up heads (being clicked). When we compare search based

ads, we are comparing coins and asking: which one is most likely to

yield the most heads in the long term?

The simple approach is to pick the

coin that has the highest proportion of heads. We would like

to be able to say not just that one coin is better, but have some level

of confidence in our judgment.

The answer to dramatically

increasing the effectiveness of

search based advertising is

AdScoringTM.

w

ww

.lo

gic

361.

co

m



For example, in the following 30 day A/B ad test (actual results) which ad is

the most effective and why?

The answer is the Champion (A) ad based on the number of conversions and

cost per conversion.

What if you knew after 7 days that the Champion Ad (A) had a higher

statistical probability of being more effective than the Challenger Ad (B) –

would you conclude the test? The answer is ―yes‖ especially given the

significantly differences in effectiveness.

After 7 days, Logic361’s ad scoring algorithm determined that there was a

72% probability that the Champion Ad (A) would out-perform the Challenger

Ad (B) and by the 30th day the probability had increased to 93%.

What was the opportunity cost of not concluding the test at the end of 7

days? What would have been the impact of correspondingly shifting the

Challenger Ad impressions to the Champion Ad?

The Effectiveness of “Shifting” Impressions from “B” to “A”

Had the search marketer concluded the test after 7 days, the result would

have been a 25% increase in conversions (from 32 to 40) and a 31%

decrease in cost per conversion ($31.33 versus $45.38).

Impressions Clicks

Click

Rate Conversions

Conversion

Rate

Cost Per

Click Spend

Cost Per

Conversion

Challenger (B) 131,995 1,036 0.78% 11 1.06% 0.77$ 795$ 72.24$

Champion (A) 110,748 823 0.74% 21 2.55% 0.80$ 657$ 31.28$

Overall Results 242,743 1,859 32 1,452$ 45.36$

Impressions Clicks

Click

Rate Conversions

Conversion

Rate

Cost Per

Click Spend

Cost Per

Conversion

Shifted From Challenger (B) 101,636 752 0.74% 19 2.55% 0.80$ 602$ 31.37$

Champion (A) 110,748 823 0.74% 21 2.55% 0.80$ 657$ 31.28$

Overall Results 212,384 1,575 40 1,259$ 31.33$

Previous Overall Results 242,743 1,859 32 1,452$ 45.38$

Net Conversion Gain 8 Net CPC Decrease (14.05)$

w

ww

.lo

gic

361.

co

m



The Hidden Cost of Not Concluding A/B Ad

Tests Quickly

The financial impact of making a decision sooner would have been

an 85% increase in contribution margin.

Customer Case Study

When Logic361’s AdScoringTM is applied to an entire search advertising

account dramatic financial results can be achieved.

Utilizing the Logic361’s software our professional services consultants

analyzed 4,000 ad groups for an international retailer with 210 online stores.

Our software identified the one or two best performing ads per ad group

(based on revenue and client profit targets) and modeled the impact of

pausing 37% of the low or non-performing ad inventory and shifting over 3

million under-monetized monthly impressions.

With Without

AdScoring AdScoring

(40 Orders) (32 Orders)

Revenue 9,880$ 7,904$

(Average Order $247.00)

Cost of Goods Sold (70%) 6,916 5,533

Search Advertising Cost 1,259 1,452

Contribution Margin 1,705$ 919$

Contribution Margin Increase (85%) 786$

w

ww

.lo

gic

361.

co

m



The actual results achieved for this client were:

26% increase (536) in monthly conversions (from 2,063 to 2,599)

37% decrease in average cost per conversion (From $35.54 to $22.39)

which equated to a monthly advertising cost savings of $34,176 per

month (based on the increased conversions.)

The third week following the pausing of the low and non-performing ad

inventory the client realized the highest record week of sales in the history

of company.

Summary

Why Should You Use Ad Scoring to Improve A/B Ad Testing?

Writing and determining effective ad copy is arguably the most important

responsibility a search marketer has. The effectiveness of search ads is the

keystone of search based advertising ROI. Given its importance, it’s

surprising that statistical ad scoring is not a standard practice on par with

bid management.

Logic361 is the first company (to our knowledge) to develop a search based

ad scoring solution capable of systematically analyzing hundreds of

thousands of simultaneous A/B ad tests. The results that have been achieved

clearly demonstrate that assessing ad inventory, scoring A/B ad tests and

re-distributing under monetized impressions can generate dramatic bottom

line results.

We developed our ad scoring solution with two goals in mind. First, we

wanted to be 100% confident in our scoring methodology. Second, we

wanted the scoring algorithm/methodology to be precise but not so complex

that it was a ―black box‖ that could not be easily understood and explained.

The addendum provides the opportunity to review our methodology. We

welcome your questions and comments.

w

ww

.lo

gic

361.

co

m



About Logic361

Logic361 AdScoringTM software speeds decision-making and empowers search engine advertisers with the unprecedented capability to quickly

assess, prioritize and respond to changes in paid search advertising performance. The Company’s data driven, scientific approach combines automated analyses, results-orientated prioritization and advanced decision-making methodologies -- within a single, powerful application.

For information, or to schedule an analysis of your ad inventory, contact: Stephen Schramke [email protected] (206) 842-0747

Copyright © 2010 Logic361 Corporation. Logic361TM, the Logic361 logo, and AdScoringTM are

trademarks of Logic361 Corporation that may be registered in some jurisdictions. All other company and product names are the property of their respective owners.

All rights reserved worldwide. No part of this publication may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any human or computer language in any form or by any means without the express written permission of:

Logic361 Corporation 93 S Jackson Street, Suite 22340 Seattle, WA 98104

This publication is provided as is without warranty of any kind, express or implied, including, but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-

infringement. This publication could include technical inaccuracies or typographical errors. Changes are periodically

added to the information herein. These changes will be incorporated in new editions of the publication. Logic361 Corporation may make improvements and/or changes at any time to the product(s) and/or the programs(s) described in this publication.

All terms mentioned in this publication that are known to be trademarks or service marks have been appropriately capitalized. Use of a term in this publication should not be regarded as affecting the validity of any trademark or service mark.

w

ww

.lo

gic

361.

co

m



Addendum How Logic361 Developed a Solution for Assessing Search Based A/B Ad Tests

Logic361’s software engineering team partnered with the Dataspora company to develop a programmable statistical methodology for accessing paid search ad performance.

The Dataspora team was led by Michael E. Driscoll, who has a decade of experience developing large-scale databases and data mining algorithms within industry, government, and academic institutions. Michael has a Ph.D. in Bioinformatics from Boston University and an A.B. from Harvard University.

Michael was assisted by John Mount, Ph.D.. John is an expert in web-scale algorithms and statistics. His interests include optimization and recent positions include directing research at online retailer Shopping.com. John

has a Ph.D. in Computer Science from Carnegie Mellon University and an A.B. in Mathematics from U.C. Berkeley.

w

ww

.lo

gic

361.

co

m



Logic361’s Statistical Methodology

For Assessing Ad Performance

Comparing the Click Rate Performance of Two Ads When we compare the click rate performance of two ads, we are comparing

two binomial random distributions and asking: which one is better? The graphic below shows two curves: (i) a champion ad in blue with a 5% click rate based on 10,000 impressions, and (ii) a challenger ad in green with a 7% click rate based on 100 impressions. We know the challenger ad is performing better on a click rate basis, but because we have only 100 data points, we can’t say for certain that this is not a result of chance.

What we’d like to do is have a measure that could tell us the likelihood that our challenger ad click rate would end-up lower than

the champion ad by chance; if we took samples from the blue and green distributions, how often would green

be higher? Unfortunately, calculating this for two

binomial distributions is a non‐trivial task. Fortunately, calculating this for two normal distributions is

easier, and more relevantly, can be implemented programmatically in a straightforward way. We start by approximating the binomial distribution to a normal distribution

with mean and variance given by:

w

ww

.lo

gic

361.

co

m



Where N is the sample size (or in this case, impressions), and p is the

success rate (click rate).

As seen below, when the number of impressions N is large this

approximation is excellent (blue is binomial, red is normal) but not so when

N is much smaller than 100.

Calculating the difference between two normal distributions X and Y yields a

normal distribution as a result, with mean and variance given by:

We can express this in terms of two sets of binomial parameters, N and p, as

follows:

We now have a probability distribution that describes, given the sample sizes

and click rates of two ads, the likelihood of seeing a given margin of

difference by chance.

w

ww

.lo

gic

361.

co

m



For our basic question – how confident are we of the observed difference between challenger and champion ad – we can calculate a p value by way of the z statistic.

The z statistic is a normalized measurement of deviation from the mean: for a given value in a distribution, it’s the value’s distance from the mean divided by the standard deviation. Thus z scores have a zero mean and a standard deviation of 1: the classic normal distribution. This has relevance

because programmatically, once we have z scores, we can easily convert them into p values: percentages that say ―there’s 95% chance the champion or challenger ad is better.‖ z scores are calculated as:

Where the mean variance is defined as:

We can convert z scores into p values using the cumulative density function

for the normal. Given a value, it returns the quartile.

When 2 Ads Have a Small Number of Clicks: Fisher’s Exact Test

When our sample sizes are small (as is more often the case with clicks

rather than impressions), we calculate our confidence metric (that a

challenger is outperforming a champion or vice versa) using Fisher’s Exact

Test. We use Fisher’s Exact Test when the sum of observations is less than

20 (this becomes computationally infeasible on a standard server platform to

perform for N > 20).

w

ww

.lo

gic

361.

co

m



The Fisher Exact test relies on exhaustively calculating all possible

outcomes, and identifying those that match or exceed our observed difference between challenger versus champion impressions and clicks. This

fraction represents the probability that our difference could have occurred by chance (one minus this fraction is thus our confidence or p‐value).

First we define standard five Fisher quantities (Fischer’s test is usually

applied to 2 x 2 tables) as:

We can then calculate our p value as follows:

In the figure below, we show the implicit distributions for the click rates on

two ads, with 1.88% CTR and 2.53% CTR, respectively. Based on these

underlying distributions, we have 35% confidence that the higher ad (in

red), will outperform the lower (in blue) going forward.

w

ww

.lo

gic

361.

co

m



This diagram illustrates our approach visually: that we can estimate our

confidence value by looking at the amount of overlap of our distributions.

The more overlap, the less confident we are that the higher one is higher.

We can quantify this value exactly as:

Where the functions are probability distribution functions, we know that the

red area sums to 1, so it reduces to simply 1- area(intersection). In our

example, about 65% of the red area is in the overlap, thus only 35% is

strictly greater.

In general, we can calculate the area of overlap between two random,

unimodal probability distribution functions X and Y via integration, as follows

The following are additional examples:

w

ww

.lo

gic

361.

co

m



Calculating the Confidence Metric for the Cost Distributions

We have observable data for impressions, clicks, and acquisitions. Average

cost per click is a constant (for a given ad). Given our observable data, we

first (i) infer the implicit distributions from which these values are drawn

(sometimes called the posterior distribution), and (ii) compare two

distributions and state a level of confidence that, for future observations,

one will remain higher than another.

We then extend our approach of analyzing probability functions to estimating

differences between other metrics, such as cost-per-acquisition, cpa. The

cpa is a function that depends on a random variable for acquisitions, a.

Given a pair of cpa measures for two ads, call them cpaX and cpaY we can

generate distribution functions which are conditioned on a, the number of

acquisitions.

Where cpa(a) is a function defined (from our table) as:

w

ww

.lo

gic

361.

co

m



And because cpc and c are constants, we only integrate over all the possible

values of a. We perform the same analysis for spend to value, plugging in

the values from our table, and integrating over all possible values of c while

holding other variables constant.

Variable Symbol Derivation Example Posterior Probability Distribution

Impressions 100,00

0

Clicks 1,100 binom (i,ctr)

Acquisitions 22 binom (c,ar)

Click-thru

rate

1.1% beta(i-c,c)

Acquisition

rate

2.0% beta(c-a,a)

Cost per

click

$2.00

Cost per

acquisition

$100

Total spend $2200

Total value $3300

Spend to

value

66.0%

w

ww

.lo

gic

361.

co

m

the startling truth: expert ad scoring software increase search engine advertising profitability

Business

ad ads

search ads

ad optimization

ad rotategoogle

searchbased ads verticals

best search

half of search

adwords ads