A/B Test Design Template: Hypothesis, Metrics, Sample Size, and QA

Why the design document matters

A/B test analysis usually goes wrong before the analysis starts. If the hypothesis is vague, the primary metric changes after launch, or the traffic split is not checked, the final report will be hard to trust.

This template helps analysts, product managers, and marketers agree on the experiment setup before traffic is exposed. It is designed for practical business tests such as checkout changes, onboarding flows, pricing pages, landing pages, promotion messages, and lifecycle campaigns.

Copy this A/B test design template

Experiment name:
Business problem:
Hypothesis:
Control experience:
Treatment experience:
Target users:
Randomization unit:
Traffic split:
Primary metric:
Guardrail metrics:
Minimum detectable effect:
Planned sample size:
Experiment dates:
Segment plan:
Decision rule:
Pre-launch QA owner:

Write a measurable hypothesis

A weak hypothesis says, "The new page will perform better." A useful hypothesis explains the mechanism and the metric.

Weak:
The new checkout page will improve conversion.

Better:
Reducing the number of checkout form fields will reduce friction for mobile users
and increase checkout completion rate without increasing refund rate.

Define the randomization unit

The randomization unit is the entity assigned to control or treatment. Common choices are user, account, session, device, store, or geographic region. The wrong unit can contaminate the test.

If the same user can see both variants, your result may measure confusion rather than product impact. For most digital product tests, user-level or account-level assignment is safer than session-level assignment.

Choose one primary metric

The primary metric should match the business decision. If the test changes checkout copy, checkout completion rate may be primary. If the test changes a pricing page, revenue per visitor may be more useful than clicks.

Secondary metrics can explain the movement, but they should not all become decision metrics. If every metric is primary, the team can always find one number that supports its preferred decision.

Define guardrail metrics

Guardrails protect the business from launching a change that improves one number while damaging another. For example, a discount message may lift conversion but hurt margin, refund rate, or long-term retention.

Conversion tests: refund rate, complaint rate, payment failure rate.
Revenue tests: gross margin, cancellation rate, repeat purchase rate.
Onboarding tests: activation quality, support tickets, churn.
Performance tests: page speed, crash rate, latency.

Sample size planning

Before launch, estimate baseline conversion rate, minimum detectable effect, significance level, power, and expected eligible traffic. If the required sample is far larger than your traffic, redesign the test instead of running an underpowered experiment.

Sample-size inputs:
baseline_conversion_rate = 8.0%
minimum_detectable_effect = 10% relative lift
alpha = 0.05
power = 0.80
expected_daily_users = 4,000
traffic_allocation = 50% / 50%

SQL QA check before launch

Use a simple query to confirm assignment volume, duplicate assignment, and early conversion events. This should be checked before the experiment becomes a high-stakes launch decision.

SELECT
  variant,
  COUNT(*) AS assignment_rows,
  COUNT(DISTINCT user_id) AS assigned_users,
  COUNT(*) - COUNT(DISTINCT user_id) AS duplicate_rows
FROM experiment_assignments
WHERE experiment_id = 'checkout_form_test'
GROUP BY 1
ORDER BY 1;

Traffic split check

A 50/50 test should look close to 50/50 after enough traffic arrives. A large mismatch may indicate targeting, assignment, or logging issues.

SELECT
  variant,
  COUNT(DISTINCT user_id) AS users,
  1.0 * COUNT(DISTINCT user_id)
    / SUM(COUNT(DISTINCT user_id)) OVER () AS traffic_share
FROM experiment_assignments
WHERE experiment_id = 'checkout_form_test'
GROUP BY 1;

Decision rule examples

Launch if the primary metric improves and all guardrails are stable.
Do not launch if guardrails worsen, even if the primary metric improves.
Extend the test if sample size is below plan and no guardrail risk appears.
Rerun the test if tracking or assignment was broken.

Download the workbook and checklist

The workbook includes an A/B Test Design tab for hypothesis, metrics, sample size notes, traffic split, and decision rules. Use the launch checklist as the final QA pass before exposing users.

Download Excel template Download launch checklist