AB Testing

There are many strategies and many different ways to interact with your users. Do your users prefer funny messages or more straight-to-the-point communication? It is important to compare different options in order to find out which one works better for your app. AB Testing allows you to run a campaign featuring two or more variations of your message or workflow side by side, each on a set proportion of your campaign audience. By measuring engagement and conversion rates you can get an idea about which variant resonates best with your users as a whole, as well as drill down to find pockets of users who might react better to more targeted or personalised content. It helps you learn what works better with your users allowing you to base your decisions on actual data instead of assumptions and intuition.

Our implementation of AB Testing gives you full control, allowing you to create a workflow where only one word is different, all the way to entirely opposite workflows including send-out times, message formats and frequencies.

Before you start

With a great tool comes great responsibility. There’s a few things you need to keep in mind in order to get statistically valid results from your experiments.

First, think about what you want to achieve. Your conversion goals are the metrics you will use to determine whether or not one variation is more successful than the other. This should be reflected in the Conversion Goal you select for your campaign. Once you have identified a goal you can generate different ideas and hypotheses that you think might work better than the original version.

Then think about the target audience for your campaign and how you want to run the test. Is it an important seasonal campaign where you want to quickly compare two options in a small percentage of your user base and then use the best options for the rest of the users, or rather a learning experiment where you will use the results to improve your general communication strategy in the future?

Also keep in mind that you will always need a minimum sample size, or in this case number of users, for each variation in order to have statistically significant and valid results from a test. Running a test on a very small group will lead to decisions being made over pure randomness rather than solid data.

Once you know your goal, your audience and your variations, you can proceed to create the campaign in our platform and start running the experiment. Let the campaign run it’s course for some time until you have enough data for the experiment to be significant. Analyze the results and decide on the next steps.

Setting up an experiment

AB Testing is available for any kind of campaign. When creating a campaign, simply click the “A-B Testing” toggle to add workflow variants. You can add as many variants as you want and give each of them a specific weight, which determines the percentage of users that will receive each of the workflows; keep in mind, however, that the more variants you add and less weight you have for each, the more difficult it will be to get enough data to make it significant. Each user is then assigned to one workflow and receives only actions from this workflow for the duration of the campaign.

enable a-b testing and add new variants to your campaign

It is also possible to compare the performance of the campaign versus a group of users who don’t receive any actions, i.e. a control group for the particular campaign experiment (different to the global control group of users who don’t receive actions from any campaign). Use the control group to see if your communication has any impact at all on your users’ behavior.

When adding a workflow, you can start with a new blank workflow if your campaign is not too complex or there are lots of differences between variants. However, you always have the option of using an existing workflow upon which to base your new variant and save a few clicks here and there.

Distribution

You can set up any number of variants of your message and assign a proportion of the audience from 0-100%.

a note about randomness

Depending on the size of the audience you are targeting, you might not get exactly the distribution you intended, having maybe more of one and less of the other than your settings would suggest.

Think of it this way; if you toss a coin five times, there is a good chance that you will end up with five heads or five tails; the more times you toss the coin, the more likely you will eventually end up with a normal distrubtion of 50% heads and 50% tails.

Target groups are ‘sticky’, which means that once a user has been assinged a certain variant, they won’t be taken out of that group if you change the distribution, only new users will be allocated until your desired spread is matched. This is not the case for users in the control group, giving you the opportunity to run a campaign for a small portion of the target audience and then sending the campaign to users who were initially assigned to the control group if numbers look good.

Advanced user assignment

Advanced user assignment

Assigning users to the global control group, the campaign control group and each of the campaign workflow variants is completely independent. A user could potentially be assigned to the global control group, the campaign control group and workflow A at the same time (this user will not receive any actions). For example, for an app with a global control group of 10%, a campaign experiment is run with a 5050 distribution between variants A and B and a campaign control group of 20%. The result of the combined groups for every 1000 users will be:

  • 10 users are assigned to the global control group, campaign control group and workflow variant A (do not receive any actions).
  • 10 users are assigned to the global control group, campaign control group and workflow variant B (do not receive any actions).
  • 40 users are assigned to the global control group and workflow variant A (do not receive any actions).
  • 40 users are assigned to the global control group and workflow variant B (do not receive any actions).
  • 90 users are assigned to the campaign control group and workflow variant A (do not receive any actions).
  • 90 users are assigned to the campaign control group and workflow variant B (do not receive any actions).
  • 360 users receive workflow variant A
  • 360 users receive workflow variant B

In total, 72% of users receive campaign actions and 28% do not receive any actions.

What kind of experiments can I run?

Our A-B Testing tool is quite flexible, allowing you to create workflows as similar or as different as you want for the same campaign, be it different attached images or slighlty altered wording in the message or contacting users in the morning vs mid-morning, evening or just before bed.

To compare small variations on your message text, simply add all information needed on the workflow as you would do for any campaign, and then use the “Add variation” button to clone the workflow into variant B. Make the necessary adjustment to variant B and you’re done.

If you want to know if your message has an impact at all, you can compare against a control group who is not contacted as part of your campaign, but who would otherwise fit all the criteria. We watch these control users during the course of the campaign to see if and how many of them would be organically inclined to reach the conversion goal without the gentle nudge of a few well-timed push or inapp messages.

ab testing results

Pitfalls

There are many, many mistakes that can be made with AB testing. If you’re investing time into peforming split testing, you don’t want to waste it - so try to avoid falling into any of these traps.

  1. Premature termination: Stopping an experiment before you have enough data to make it statistically valid is a very common mistake. Give it time to reach the required significance.
  2. Generalization: You might start a test on Monday and after three days you’ve reached the required minimum sample size, you have a significant result - great! Unfortunately, you have to accept a major caveat for this result in that it is only valid for those three days - Monday to Thursday. It is possible (and very likely) that you see quite different user behaviour across each day of the week, so if you want a conclusion that is valid for every day, you should try to run experiments to cover a full week. In addition, seasonality needs to be accounted for in many cases. An experiment over Christmas (or any major holdiay period) is likely have different results compared to one run during a more run-of-the-mill time of year. Be careful when generalizing your results and making assumptions on your whole user base or a different time to when the experiment was run.
  3. Overcomplicating: There is a story about a design experiment at Google where they tested the effect of 41 different shades of blue on a toolbar. This is an extreme example of A/B/C/D/etc testing - and almost legendary in the AB testing world. The amount of traffic required to make a significant experiment with too many options can be very large, so keep things simple and test just a few options each time.
  4. Overexpecting: Don’t expect all experiments to go your way or show very positive results. At the end of the day, that’s why you need to test - to base your decisions on hard data and improve with time and repetition. The most important thing is that you always learn something from your experiments, even if that something is not what you expected.
suggested resources
Evan Millar - How Not To Run An A/B Test
Chris Stucchio - No Free Samples
Peep Laja - 12 A/B Split Testing Mistakes I See Businesses Make All The Time
Mixpanel - Mobile A/B Testing: Walkthrough & Best Practices
Optimove - Statistical Significance in Marketing
Optimizely - What is A/B Testing?
VWO - The Complete Guide to A/B Testing