4. Retrieve and evaluate A/B test results
While the test runs, you can return to the A/B tests page anytime to check the results by clicking the Open button next to the test.
When are the results visible
Results will be available 24 hours after starting the test and will continue to show updated results. The update schedule can vary for different customer groups. The data might reflect updates from a few hours ago or, at most, 24 hours prior.

Example of test results
Roles to access test reports
You must have at least one of these IAM roles to view the A/B test reports:
- Categories merchandising editor
- Global merchandising editor
- Search merchandising editor
- SEO thematic pages editor
- SEO widgets editor
Steps to interpret A/B test results
Following are the steps to interpret the A/B test results:
1. Review metrics
Check key metrics that are relevant to your business goals. Compare the metrics between the test variant and control group. The available metrics are listed below:
-
Visitors
-
ATC
-
Conversion
-
Revenue
-
Visits
-
AOV (Lift | Confidence %)
-
ATC Rate (Lift | Confidence %)
-
CR (Lift | Confidence %)
-
RPV (Lift | Confidence %)
2. Examine lift
Look at the lift value to see how much the test variant improved over the control.

3. Check confidence level
The confidence level in the results panel indicates the statistical certainty of the lift. A higher percentage (more than 90%) suggests the results are reliable.

4. Analyze graphs
Use graphs to see trends over time and spot any patterns.

5. Make decisions
- If the lift is positive and the confidence level is high, consider implementing the changes from the test variant on your site.
- If results are inconclusive, consider ending the test and starting a new one.
Troubleshooting
1. Why do I see a different traffic split?
You may see a different or uneven traffic split due to the following causes:
-
Caching issues: Your current caching strategy might interfere with the proper traffic allocation between the test groups.
-
Empty or skewed _br_uid_2 parameter: Sending traffic with empty or skewed _br_uid_2 parameters can result in an inaccurate split of the test groups.
-
Low-volume query: Low traffic can make it hard to see the impact of the test and properly bucket the traffic.
We suggest that you investigate these issues. For further help, contact Bloomreach Support.
2. Why do I see 0% values?
You may see a change of 0% in certain metrics, which likely means there is no statistically significant difference between the control and test buckets for those specific metrics. This could be due to:
-
Small effect size: The actual change (test applicability or traffic impacted) is too small to move the needle or show statistical significance.
-
Confidence intervals: The confidence interval might include 0, suggesting no clear difference.
-
Lack of sensitivity in the metric: The selected metric may not be sensitive enough to capture changes caused by the given test parameters.
Best practices to assess the validity of test results (Optional reading)
Note that while these are general best practices, they can differ based on factors like your business goals and business size.
Assessing KPIs, lift, and confidence
-
Guiding metric: Your guiding metric depends on what you're trying to test and your business's goals. For example, if you're trying to improve relevance, you might focus more on the ATC (Add to Cart) rate. If you’re focusing on increasing revenue with merchandising, you might look at Conversion rate and RPV.
-
Confidence: Use confidence level as the metric to determine when a test is over and there is a clear winner. A confidence level of over 90% is a good benchmark.
-
Assessing lift: The lift threshold can vary based on your business size. A small or mid-sized business may see a 5% lift, while a large enterprise business may see a .05% lift. This seemingly small percent lift can still translate into millions of dollars in incremental lift for the business. It’s crucial that the lift is statistically significant. Significance can be confirmed when the confidence is over 90%.
Note
Statistical significance and confidence level
Statistical significance is the probability that the results are not caused by a random chance. This is represented by a p-value. A p-value of less than 0.10 means there's less than a 10% chance the results are random, making them statistically significant.
Confidence level indicates the percent confidence in the results. Suppose you estimate that your ATC rate will improve between 3% and 4%. A 90% confidence means that if you repeat the experiment 100 times, the result will fall within this range 90 times.
The A/B test results panel shows the confidence level. Confidence level and statistical significance are distinct yet related concepts. A confidence level of 90% corresponds to a p-value of less than 0.10. Therefore, you can use confidence level to deduce the significance of the results.
Factors affecting confidence
-
Scale of changes: A test may not reach confidence if there isn't much variation between the two experiences. For example, if you just lock 2 products and then A/B test a category page, the experiences are still very similar and will likely not have a conclusive winner.
-
Traffic: It will take longer to reach confidence if you're testing a low-traffic query/category or if you're only A/B testing a small portion of your traffic. Bloomreach recommends testing with 100% traffic for most tests where you feel fairly confident. Only run the test with smaller traffic if the change is significant and you are hesitant to expose it to half of your traffic.
-
Testing merchandising strategy: If your goal is to test an overarching merchandising strategy, like boosting sales, newness, low stock levels, or optimizing for a specific signal like Conversion or Revenue, we recommend testing at a global level to reach confidence faster.
Updated about 8 hours ago