How to Write an A/B Test Analysis Report
A useful experiment report does more than say which group won. It explains impact, uncertainty, segment patterns, and what the business should do next.
Start with the conclusion
Executives and product teams need the decision first. Summarize whether the treatment improved the primary metric, whether the result is statistically reliable, and whether you recommend launch, rollback, or more observation.
Recommendation: Launch to 100% of eligible users.
Why: treatment conversion rate improved by 8.5% relative lift.
Confidence: result is statistically significant and guardrails are stable.
Next step: monitor refund rate and repeat behavior for two weeks after rollout.
Show the experiment context
Include the experiment background, control version, treatment version, target users, date range, and traffic allocation. This makes the report auditable.
Report overall impact
For each metric, show control value, treatment value, absolute change, relative lift, p-value or confidence interval, and sample size. Avoid reporting only percentages without counts.
metric | control | treatment | lift | p-value | decision
conversion_rate | 8.2% | 8.9% | +8.5% | 0.04 | launch candidate
Check segments and process metrics
Look at key segments such as user type, device, channel, geography, or funnel step. A treatment can lift the headline metric while hurting a critical user group.
Segment analysis should explain the result, not create a fishing expedition. Start with segments that were named in the design document. Then add exploratory cuts only if they are clearly labeled as exploratory.
End with a recommendation
State the decision and the next measurement plan. If the result is inconclusive, explain whether the issue is traffic, effect size, metric noise, or experiment design.
How to explain an inconclusive test
An inconclusive test is not a failure. It can mean the effect is smaller than expected, the sample size was too low, the metric is noisy, or the product change did not alter user behavior. The report should say which explanation is most likely.
- If traffic was too low, recommend extending the test or increasing allocation.
- If guardrails worsened, recommend rollback even if the primary metric improved.
- If the effect was small but positive, recommend a cost-benefit review before launch.
- If tracking was broken, mark the result invalid and rerun the experiment.
SQL example
This query summarizes control and treatment performance. Add segment columns when you need channel, device, or user-type analysis.
WITH assigned_users AS (
SELECT user_id, experiment_group
FROM experiment_assignments
WHERE experiment_name = 'reward_page_test'
),
conversions AS (
SELECT DISTINCT user_id
FROM events
WHERE event_name = 'deposit_complete'
)
SELECT
experiment_group,
COUNT(DISTINCT assigned_users.user_id) AS users,
COUNT(DISTINCT conversions.user_id) AS conversions,
1.0 * COUNT(DISTINCT conversions.user_id)
/ COUNT(DISTINCT assigned_users.user_id) AS conversion_rate
FROM assigned_users
LEFT JOIN conversions
ON assigned_users.user_id = conversions.user_id
GROUP BY 1;
Template and chart
The workbook includes an A/B Test Analysis tab with control versus treatment rates, relative lift formulas, a decision column, and a chart-ready summary.
Download Excel template Download analysis checklist
Common mistakes
- Reporting only relative lift without showing baseline and counts.
- Ignoring guardrails because the primary metric improved.
- Mixing users assigned before and after tracking changes.
- Making a launch decision from one high-performing segment.
- Using screenshots instead of a reproducible query or workbook.