metrics newmedia
DESCRIPTION
TRANSCRIPT
Testing for Best Practices in E-mail Advocacy
Nirmal Mankani, New Organizing Institute@n1rm
Randomized Experiments• What works and what doesn’t?
• Experiments are the best way to understand a cause and effect relationship between an action we take and an outcome we observe
• Random assignment• Experimental unit (ex. e-mail recipients) randomly
assigned to each condition (a particular tactic or message)• Any difference in observed result can be attributed to our
treatment
• Obs. analysis vs Experiments
A/B Testing
Experiment informed program (a.k.a. A/B testing)
A/B testing is fantastic, but difficult to infer best practices. Often lacks external validity Context Over interpretation
Beyond A/B Tests
Implement tests repeatedly (best practices)
Think beyond context of individual e-mail Future consequences of current actions
(downstream effects) Testing sustained campaigns
Clearly Define a Research Question
For example… Optimal path to engagement on signup? Warm-up e-mails? Or lower bar action preceding “big” ask? Ratio of types of asks? Are list users homogenous? Or do different groups of list members
respond to different appeals? Voterfile data? Testing themes over time (ex. personal stories vs. facts) How much is too much e-mail? Holdout universes. Encourage future action through plan-making or reinforcing identity? Combining with other modes of contact (ex. text or phones) Tracking and comparing progress to other activists, even by geography Incentives (recruitment vs fundraising vs none) Embedded chat rooms on landing pages to encourage calls Tactical questions (ex. Time of day, fundraising dollar amount)
Data Collection• Dependent Variables (outcomes we care about)
• Open and Click-through rates• Advocacy (petitions, letters to the editor, calls to
Congress, event attendance)• Fundraising ($ amount and # donors)• List size (growth, unsubscribes)• Electoral & public opinion (persuasion, turnout)• Activity on other networks (ex. Facebook)
• Measurement• Reference codes, e-mail address
match, other matches
Universe Size• A result is statistically significant if it is unlikely
to have occurred by chance• p value: probability of observing test statistic at least
as large if true effect of your intervention is zero
• Statistical power is the likelihood that our test will find a statistically significant effect, if it exists• Probability of rejecting false null hypothesis
• Power calculator: http://www.dssresearch.com/toolkit/spcalc/power_p2.asp
• Decide target universe and sample sizes
Implementation – Challenges of Using Built-in A/B Test Tools
• Can’t export groups = limited tracking• Ex. matching call-in data from external tool,
people who enter test universe conditional on some action, determining how people perform on subsequent e-mails
• Groups don’t persist over time• Prohibits treating a group with multiple e-mail
blasts
• Reporting
Implementation – Random Assignment and E-mail Setup
1. Randomly assign universe to groups (either in your tool or external program like Excel)
2. Set up e-mail versions for each group (each version should be identical except element you are testing)
3. Send all versions at the same time
4. Save your groups!
Analysis• Analyze results based on your original group
assignments!
• Which test?• T-test• Chi-square test for categorical outcomes (ex. did
or did not open an e-mail)• Mann-Whitney test for nonparametric outcomes
(ex. fundraising tests)• Linear or logistic regression where treatment
indicator is IV and outcome is DV
Contact Us
HCAN – Diverse Sender Test
Hypothesis: For each e-mail an organization sends, does using a different sender for each blast improve overall list performance?
Three treatment e-mails sent from 11/23/09 to 12/04/09, effect measured on e-mails starting 12/08/09
50,000 in treatment group (each e-mail received from different sender), 96,572 in control (same sender every e-mail)
HCAN – Diverse Sender Test
Sender 1 Sender 2 Sender 3 Levana
Levana Levana Levana Levana
HCAN – Diverse Sender Test
** Significant at 0.01
E-mail Date Control Open Rate
Treatment Open Rate
Control CTR
Treatment CTR
Senate Vote Followup
11/23 26.1%** 33.4%** 12.5%** 15.3%**
Stop Stupak
12/01 21.6%** 23.4%** 13.0%** 14.0%**
Build for the win
12/04 21.2%** 23.1%** 2.6% 2.6%
HCAN – Diverse Sender TestE-mail Date Control
Open RateTreatment Open Rate
Control CTR
Treatment CTR
Public Option Petition
12/08 23.8%* 24.3%* 1.6% 1.7%
Information Update
12/19 23.5%** 24.2%** 0.5% 0.5%
* Significant at 0.05 ** Significant at 0.01
HCAN – Diverse Sender Test