A/B Testing
What is A/B testing?
A/B testing (split testing) is a method of comparing two versions of a product, feature, or design to see which performs better. You split users randomly between a control and a variant, then compare results on a chosen metric.
Use it when: you have a clear hypothesis about a change and enough traffic to reach statistical significance within a reasonable time.
Copy/paste template
- Hypothesis: If we [change], then [metric] will [improve] because [reason].
- Primary metric: [e.g. sign-up rate, checkout completion]
- Minimum sample size: [use a calculator; account for baseline and desired lift]
- Duration: [e.g. 2 weeks, or until 95% confidence]
- Success criteria: [what you’ll ship vs. roll back]
Why A/B testing matters
- Reduces risk by testing with real users before a full rollout.
- Replaces opinion with evidence so decisions are defensible.
- Surfaces what actually moves the needle instead of what “should” work.
- Builds a culture of learning and iteration.
What a good A/B test includes
Checklist
- [ ] One clear hypothesis (one change, one primary metric).
- [ ] Enough traffic and time to reach statistical significance.
- [ ] A defined success threshold and plan to act on the result.
- [ ] No contamination (e.g. same user seeing both variants, or external campaigns skewing the sample).
Common formats
- Single-element test: one variable (e.g. CTA copy, button colour). Easiest to interpret.
- Multivariate test: several elements at once. Use only when you have high traffic and need to explore combinations.
Examples
Example (the realistic one)
You believe moving the main CTA above the fold will increase sign-ups. Hypothesis: “If we move the sign-up button above the fold, sign-up rate will increase because users won’t need to scroll to convert.” You run a 50/50 test for two weeks, primary metric sign-up rate, and decide in advance: 95% confidence and at least 2% relative lift to ship.
Common pitfalls
- Testing too many things at once: you can’t tell what drove the result. → Do this instead: one change per test, one primary metric.
- Stopping too early: peeking inflates false positives. → Do this instead: set sample size and duration up front; resist the urge to call the winner early.
- Ignoring practical significance: a tiny lift can be “significant” but irrelevant. → Do this instead: define a minimum lift that would justify the change.
- No hypothesis: running tests for the sake of it. → Do this instead: write “If we X, then Y will Z because…” before you build the variant.
A/B testing vs. related concepts
- A/B testing vs multivariate testing: A/B compares two versions; multivariate tests multiple elements at once. Start with A/B unless you have high traffic and a clear reason.
- A/B testing vs usability testing: A/B measures behaviour at scale; usability testing explains why. Use both: usability for insight, A/B for validation.
Related terms
- Experimentation – broader practice of testing ideas; A/B is one method.
- Telemetry – instrumentation you need to measure test outcomes.
- Problem statement – define the problem before you A/B test a solution.
- User research – qualitative input to form hypotheses worth testing.
- Minimum viable product – ship the smallest thing, then use A/B to improve it.
- Release habits – how you ship and roll back; affects how safely you can run tests.
- Feature prioritisation – what to build next; A/B helps you learn what to keep or change.
Next step
If you’re forming hypotheses from user feedback, read User research. If you’re ready to run a test, lock your hypothesis and primary metric using the template above.