Evaluate predictions

When you run a prediction-driven campaign, two things affect the result: the model and the incentive. To understand each one's real impact, test them separately using a four-group structure that isolates what each factor contributes.

How the evaluation works

The four-group structure works by crossing two variables: prediction score (high or low probability) and campaign exposure (included or excluded). This lets you measure the model's ability to segment customers independently from the campaign's ability to drive conversions. Divide customers as follows:

High or low probability of reaching the target (based on the prediction).
Included or excluded from the campaign.

	Included in campaign	Not included in campaign
Low probability	Group A	Group C
High probability	Group B	Group D

After a campaign runs, use the four groups to answer three questions:

Is the model working? Compare Group D versus Group C. If the high-probability control group outperforms the low-probability control group — with no campaign — the model is correctly identifying intent.
Is the campaign working? Compare Groups A+B versus Groups C+D. This shows whether the campaign adds value at all, independent of prediction.
Where does the campaign add the most lift? Compare uplift per segment. High segment: Group B versus Group D. Low segment: Group A versus Group C. Focus spend on the segment where the campaign drives the most additional conversions.

This framework protects against two mistakes: wasting budget on segments that convert anyway, and missing segments that respond strongly to the right offer.

Treat the recommendations in this article as hypotheses. Launch with an A/B test, measure the lift, then scale what works.

Calculate performance metrics

Calculate each group's performance using your chosen success metric. For example, conversion rate or revenue per customer. Use the formulas below to isolate the contribution of the model and the incentive.

In the formulas below, Perf(X) refers to the performance metric for that group — for example, conversion rate.

Strength of model

Perf(D) / Perf(C)

Can the model tell apart customers who will reach the target from those who won't? Perf(D) / Perf(C) answers this — comparing high-probability customers who weren't exposed to the campaign against low-probability ones in the same conditions. A high ratio means the model is effective at segmentation.

Strength of impression

Perf(A+B) / Perf(C+D)

What lift does the campaign generate across all customers, regardless of their probability segment? Perf(A+B) / Perf(C+D) measures overall campaign effectiveness — combining both high and low probability groups to show the incentive's aggregate impact.

Strength of impression on high/low segment

High: Perf(B) / Perf(D)

Low: Perf(A) / Perf(C)

Which group benefits more from the campaign — high or low probability customers? Perf(B) / Perf(D) measures uplift for the high segment; Perf(A) / Perf(C) measures uplift for the low segment. If the low segment shows stronger uplift, the campaign is converting customers who wouldn't have acted otherwise — a sign that the incentive is doing meaningful work. If the high segment shows stronger uplift, the campaign may be rewarding customers who would have converted anyway.

Interpret prediction results: Learn what prediction scores mean and how to use them to inform campaign decisions.

How the evaluation works

Calculate performance metrics

Strength of model

Strength of impression

Strength of impression on high/low segment

Related articles