AB Testing

What's AB testing?

AB testing is the technical term for testing the effectiveness of variants of your website. A variant is essentially a change or version of something on your site, like a redirect or a ranking rule. A collection of variants represents an experience. When we're talking about AB tests, think of an experience the way you might think of test or experimental groups and control groups.

Despite the name, AB testing can involve more than two experiences. You can test up to three experiences, though it's generally easier to find statistically significant results when you limit the number of your experiences to less than that. It's up to you what constitutes a manageable number, but we recommend 2 – 3 experiences per AB test.

Caching and cookies

Bloomreach's AB testing will not work on your site if you employ any type of caching. Additionally, AB testing traffic split will only work if you have integrated the _br_uid_2 parameter correctly. For details on integrating this cookie, please refer to our Integration Guide. Please note the _br_uid_2 parameter values in the pixel and API must match.

How does AB testing differ from split testing or bucket testing?

You might hear AB testing called split testing or bucket testing. All of these terms refer to essentially the same testing technique. For simplicity, Bloomreach only uses the term AB testing.

What's the value of testing?

Testing helps you understand the effect of changes on your site and then make data-driven decisions.

For example, what's your prediction on the outcome of burying office chairs for searches of bedroom furniture? Sure, it sounds logical: if I'm looking for bedroom furniture, then it's a logical assumption that I'm not interested in an office chair. But what if I like having an office alcove in my bedroom? What if I live in a studio apartment? What if I'm a college student? Or maybe I'm a very dedicated business professional who starts every morning by rolling out of bed and answering email?

Sometimes, what sounds logical doesn't reflect customers' realities. AB testing shows the effect of proposed changes on key performance indicators. Use this data to execute your decisions quickly and successfully.

Is an AB test the same as a before-and-after comparison?

📘

Here's the short version

A before-and-after comparison gives you unreliable results because you can't control the many external factors that can affect the performance of your changes.

An AB test gives you the information you need to make a data-driven decision about your changes.

No. When you run an AB test, you set up experiments, let them run, and examine the results before fully committing to changes. AB testing is the gold standard for measuring the effects of your changes. An AB test isolates the effects of your changes by randomly exposing your change to a subset of your site visitors. Everything except your change is identical to the experience of your other site visitors. 

Before-and-after comparisons don't allow you to pinpoint the effects of your change. What if you decide to change the ranking of products on a page today, and tomorrow, your site has a huge sale? A before-and-after comparison can't tell you if your ranking change drove an increase in revenue or if the sale was responsible. It might even be both changes that increased revenue. An AB test not only tells you if your change is effective but also teases out its effectiveness from the effectiveness of other changes like a sale.

To be sure that your changes drive business to your site, we recommend AB testing rather than comparing site data before and after changes.

Conflicting rules

AB testing lets you run experiments with otherwise conflicting rules. Site visitors are randomly assigned to an experimental group. You can check the results as early and often as you like. If you notice that a particular group is performing at the level you want for your organization, then you can end the test and fully commit to the changes. Similarly, you can stop the test if you see that your proposed changes aren't having the effect that you want.

For a before-and-after comparison, you first have to commit to your changes. You can't run conflicting changes in production. For example, Mei is a digital marketer. She suspects that redirecting a "shoe" site search on her company's site to her Shoes category page will lead to more conversions than the "shoe" site search page results. She can take a chance by showing the redirect and hoping that her company will see a lift in shoe sales. Or she can take a chance by doing nothing at all. Either way, she's taking a chance and not making a data-driven decision.

What is supported?

Bloomreach currently supports AB testing for ranking changes and site search redirects. We currently do not support AB testing for ranking changes with audience targeting, facets, and assets/campaigns.

📘

Note

If you AB test a broad match ranking rule, the test will track all variations of the query that the broad match rule uses.

How do I start?

The quickest way to start a test is directly from a rule that you want to test. Click the Save New Test Variant button.

Steps to follow

  1. Click into the existing rule you would like to test, then click on Save New Test Variant.
  1. When the Rule Variants menu opens, select the variants you want to test, then click the Setup New Test button. In this example, the variants table has three variants listed. Variant 3 is being tested against the default variant.

Alternatively, there are two other ways to start setting up an AB test.

Method 1

  • You can also start a test from the Testing Overview page. Click the Start New Test button in the upper right corner of the Testing Overview page. The New Test Setup page opens, where you can give your test a name, allocate test traffic, and select the rule variant(s) that you would like to test.
  • When you finish setting up your test, click the Activate button in the upper right corner of the New Test Setup page to start the test. While the test runs, you can return to the Testing Overview page any time to check the results by clicking the View Stats button for your test.
  • When you're ready to end the test, open the Test Detail page by clicking on the test name, then click the End Test button in the upper right corner. You can choose to enact one of the test's experiences as the new default experience for all traffic to your site when you end the test.

📘

No changes to site experiences

If you're not happy with the results for any of the experiences or otherwise don't want to set a new default experience for your site, then you can simply conclude the test without changing your site. Visitor experience on your site remains the same as it was before you started your test.

Method 2

  • Alternatively, you can start a test (for ranking rules) from the Ranking page. Click on the icon in the Variants column for the rule you would like to test.

  • This will take you to the Rule Variants menu, where you select the variants you want to test and click the Setup New Test button.

Handling AB Tests with more than 1 term in the same active test

To handle AB testing scenarios where more than 1 term is included in the same active test, note the following FAQ:

1. How are the AB tests handled when more than 1 term is included in the same active test?

A user is assigned to a test or control on a random basis when the user performs the first activity relevant to the AB test. The user then remains in that bucket for the entire test period for all the qualifiable activities made by the user. This is true for all tests.

2. Are the results from both queries included in the test, or does the AB test only use the first term for the test?

Results metrics will include all the activities that are relevant to the test, so all queries are included.

3. Is the traffic split at 50/50 per query in the test, or is the traffic split just between test bucket A and bucket B regardless of the queries?

The random splitting (which is assumed to be 50-50) of users happens at the time of the first activity inside the test, which could be either a query in a bigger test or just one in a single-query test.

4. What would happen if there were far more queries in one of the test buckets?

This situation would not arise because of the inherent randomness explained in point 1 above.

5. Does it help increase traffic on test buckets by adding more queries to the test? Would this help reach significance quicker for tail queries?

Generally, adding more queries increases traffic to all buckets proportionally. Significance can increase with an increase in lift or increase in traffic, but adding more tail queries does not affect overall test volume or impact it significantly. In fact, it would make no difference.