metrics newmedia

Testing for Best Practices in E-mail Advocacy

Nirmal Mankani, New Organizing Institute@n1rm

Randomized Experiments• What works and what doesn’t?

• Experiments are the best way to understand a cause and effect relationship between an action we take and an outcome we observe

• Random assignment• Experimental unit (ex. e-mail recipients) randomly

assigned to each condition (a particular tactic or message)• Any difference in observed result can be attributed to our

treatment

• Obs. analysis vs Experiments

A/B Testing

Experiment informed program (a.k.a. A/B testing)

A/B testing is fantastic, but difficult to infer best practices. Often lacks external validity Context Over interpretation

Beyond A/B Tests

Implement tests repeatedly (best practices)

Think beyond context of individual e-mail Future consequences of current actions

(downstream effects) Testing sustained campaigns

Clearly Define a Research Question

For example… Optimal path to engagement on signup? Warm-up e-mails? Or lower bar action preceding “big” ask? Ratio of types of asks? Are list users homogenous? Or do different groups of list members

respond to different appeals? Voterfile data? Testing themes over time (ex. personal stories vs. facts) How much is too much e-mail? Holdout universes. Encourage future action through plan-making or reinforcing identity? Combining with other modes of contact (ex. text or phones) Tracking and comparing progress to other activists, even by geography Incentives (recruitment vs fundraising vs none) Embedded chat rooms on landing pages to encourage calls Tactical questions (ex. Time of day, fundraising dollar amount)

Data Collection• Dependent Variables (outcomes we care about)

• Open and Click-through rates• Advocacy (petitions, letters to the editor, calls to

Congress, event attendance)• Fundraising ($ amount and # donors)• List size (growth, unsubscribes)• Electoral & public opinion (persuasion, turnout)• Activity on other networks (ex. Facebook)

• Measurement• Reference codes, e-mail address

match, other matches

Universe Size• A result is statistically significant if it is unlikely

to have occurred by chance• p value: probability of observing test statistic at least

as large if true effect of your intervention is zero

• Statistical power is the likelihood that our test will find a statistically significant effect, if it exists• Probability of rejecting false null hypothesis

• Power calculator: http://www.dssresearch.com/toolkit/spcalc/power_p2.asp

• Decide target universe and sample sizes

Implementation – Challenges of Using Built-in A/B Test Tools

• Can’t export groups = limited tracking• Ex. matching call-in data from external tool,

people who enter test universe conditional on some action, determining how people perform on subsequent e-mails

• Groups don’t persist over time• Prohibits treating a group with multiple e-mail

blasts

• Reporting

Implementation – Random Assignment and E-mail Setup

1. Randomly assign universe to groups (either in your tool or external program like Excel)

2. Set up e-mail versions for each group (each version should be identical except element you are testing)

3. Send all versions at the same time

4. Save your groups!

Analysis• Analyze results based on your original group

assignments!

• Which test?• T-test• Chi-square test for categorical outcomes (ex. did

or did not open an e-mail)• Mann-Whitney test for nonparametric outcomes

(ex. fundraising tests)• Linear or logistic regression where treatment

indicator is IV and outcome is DV

Contact Us

[email protected]

HCAN – Diverse Sender Test

Hypothesis: For each e-mail an organization sends, does using a different sender for each blast improve overall list performance?

Three treatment e-mails sent from 11/23/09 to 12/04/09, effect measured on e-mails starting 12/08/09

50,000 in treatment group (each e-mail received from different sender), 96,572 in control (same sender every e-mail)


Sender 1 Sender 2 Sender 3 Levana

Levana Levana Levana Levana


** Significant at 0.01

E-mail Date Control Open Rate

Treatment Open Rate

Control CTR

Treatment CTR

Senate Vote Followup

11/23 26.1%** 33.4%** 12.5%** 15.3%**

Stop Stupak

12/01 21.6%** 23.4%** 13.0%** 14.0%**

Build for the win

12/04 21.2%** 23.1%** 2.6% 2.6%

HCAN – Diverse Sender TestE-mail Date Control

Open RateTreatment Open Rate

Control CTR

Treatment CTR

Public Option Petition

12/08 23.8%* 24.3%* 1.6% 1.7%

Information Update

12/19 23.5%** 24.2%** 0.5% 0.5%

* Significant at 0.05 ** Significant at 0.01

metrics newmedia

Technology

email versions

different sender

email recipients

email setup

email mannwhitney test

test statistic

ttest chisquare test

treatment group