- The importance of A/B testing,
- How to set up such use case,
- How to evaluate such a use case.
Companies face various problems and some of the ways to identify the current pain of your company include comparing your performance on basic metrics with industry benchmarks, or using the Online Retail Formula. The conclusions from it usually lead to use cases focused on increasing revenue per visitor (RPV), conversion rate, average order value (AOV), etc. Increases in these metrics are considered solutions to problems.
Use cases are most often based on an understanding of the conversion process. When you break the process down into granular steps, you can decide to improve a part of the process - with a use case.
Have a clear hypothesis before launching use cases, such as: “By highlighting the stock availability on product detail page we believe we will improve cart-to-detail ratio by 10%.”
You should be able to answer the following questions:
- What are you hoping to achieve?
- Why are you launching it?
- for this particular group of customers,
- under these conditions,
- in this form?
This thinking process during the setup can also help you at a later point, when tweaking the use case for another A/B test if the test ends up not performing as expected.
Determine how long the A/B test should run for before launching it. An A/B test has to run for a particular period of time before you can look at the results, safely make conclusions, and base actions on the data.
The test should also run for at least 1 (ideally 2) full business cycles. A business cycle is a week in 95% of cases, and it is about capturing all kinds of customers, behavior, traffic, regular campaigns etc. Your start/stop times should start and stop at the same point in the business cycle. For example, running the test for 2 weeks and sending an exceptionally huge/successful newsletter at this time will skew the results.
Ideally, always start with a 50:50 test. Do not change the ratio until you are sure of the results. Iterate and change the Variant over time rather than starting with too many Variants, as they would prolong the duration of the test.
If you use other than a 50:50 split between the Control Group and Variant during evaluation you need to segment the customers based on the number of merge events and evaluate the use case through those segments. Otherwise, the evaluation is unfair and skewed to the Variant with higher percentage.
Do not change the AB test ratio of a running campaign. If you really need to change the ratio, set up a new campaign (new AB split node, new weblayer, new experiment,...) and evaluate only the new campaign.
Customers who already had been assigned to a Variant will remain in the specific Variant and this can skew the evaluation. Imagine an AA test (comparing two exactly the same Variants) where A1 vs A2 is 20:80, after 2 weeks change to 80:20 for another 2 weeks. The sample size will be almost the same, but all the heavy visitors/purchasers will have the Variant A1 and the evaluation will say that A1 is the clear winner (but it is the same as A2).
You want your A/B test to be clean and random - so that the results are relevant and reflect reality. You can help this by cleaning your customer base of irrelevant customers before you send them to the A/B test.
If you are building your A/B test in a scenario, make sure your conditions are defined before the A/B split node.
If sending an email, the email node will automatically ensure that customers who do not have the email attribute, or the appropriate consent, do not receive the email. However, for the A/B test to be correct, you also need to eliminate such people from the Control Group, so you need explicit conditions excluding customers without email attribute OR relevant consent from the A/B test.
Similarly, you want to ensure your A/B test groups are clean in weblayers/banners, too.
Viewcount banners usually contain a condition that specifies the minimum of views an item must have recently had for the banner to be displayed (and tracked).
Make sure to create an evaluation dashboard for each use case before (or within a few days of) launching it.
Come back to it after having launched the use case to:
- check that everything is running, and being tracked,
- polish the dashboard.
You can include basic information about the use case in the dashboard. Documentation (along with processes) in general gets increasingly important for a good functioning of organizations as they grow larger. Such information can include:
- brief description of the use case
- brief description of each Variant
- date when the use case was launched
- date when anything about the use case changed (this should also make a new period of evaluation of the use case)
- attribution window used within the dashboard
In order to make sure that your use case provides you with valid information, you should filter out your employees, your agencies, Bloomreach Engagement, and outliers - customers with unusually high order value or frequency.
This should be done at the end of the evaluation as the cut-off will vary per project/use case and will not be known before launching it. You can draw the distribution of 3 of purchases and total amount of them and based on this decide what should be the cut-off.
Time-Saving Note: You want to avoid filtering out in every single component of the dashboard. To do that, you want to create the filter directly in the AB test segmentation. If you have a set of filters that you always use to exclude customers from evaluations, create a global segmentation that includes all of these filters. This means having to add one condition in the customer filter of each evaluation, instead of all of them.
It may happen that your use case is really hurting the website. It is making such a negative impact that the results really will not get better by the end of the predetermined testing period.
Let us say you need to wait for 20 days to get significance. After 10 days, you should have a look at what is happening. Uplifts should be positive. If they are negative, use the Bayesian calculator to decide whether to stop the use case (if the results are really bad) or not.
After having waited for the period of time you determined at the launch of the use case, you need to see if your use case has gotten the hypothesized results. Using a Bayesian calculator you have 3 numbers: Probability that Variant is better than Control Group, Expected uplift if Variant is actually better, Expected loss if Variant is actually worse. Based on those numbers you (with our advice) need to decide whether to use the use case or not.
You should also check the trend of the conversion - it should be consistently higher for the Variant (i.e. not generally lower, with one major spike that may have been caused by an unrelated circumstance). Keep in mind that for the first days/week the pattern is highly random meaning no conclusions can be derived from it.
You should check how the Variant influences the new or existing users. It may be hurting the existing user base. If you find this to be the case, you may want to run this Variant only on new users (since the start of the test).
If the results are not good enough to implement the use case but you still see a point in it - you believe the use case resolves a business problem that is still relevant - you need to tweak it and start the process of launching the use case again.
Do you understand why the test did not ‘confirm’ your hypothesis? Diving deeper into the data might show that it did, in fact, work for a specific segment, at a specific time, or under specific conditions (i.e. only the first impression of a banner), and give you a new hypothesis to test. You can find lessons to be learned that lead to a better informed test, and (customer) insights that are valuable for you. Finding and communicating these insights can still make the test worth it.
Multiple Variants (NOT multivariate) are suited best for pretesting (choosing the best Variant) followed by testing the best performing Variant with the Control Group. Keep in mind multiple Variant A/B tests are likely to take longer to gain significance than simple A/B tests.
When testing multiple Variants against a Control Group, you need to compare all the Variants that are better than the Control Group against the Control Group itself to see what is the probability that they are really better.
If you have more such Variants you need to compare them between each other and pick the winner based on the implementation cost and the uplift that it brings.
Second best Variant is almost as good as the first one but it costs half of the winning Variant, so you will pick the second Variant in most of the cases.
Updated 5 months ago