Free tier, no card requiredDynamic QR codes that update after printGDPR-compliant scan analyticsBuilt for agencies, freelancers & in-house teamsFree tier, no card requiredDynamic QR codes that update after printGDPR-compliant scan analyticsBuilt for agencies, freelancers & in-house teamsFree tier, no card requiredDynamic QR codes that update after printGDPR-compliant scan analyticsBuilt for agencies, freelancers & in-house teamsFree tier, no card requiredDynamic QR codes that update after printGDPR-compliant scan analyticsBuilt for agencies, freelancers & in-house teams
All posts
A single QR code splitting into two landing-page variants, A and B, with the winning version marked by a green tick.
How-to

How to A/B test a QR code campaign without fooling yourself

A printed QR code can't be randomised like a web page, so the obvious two-poster test measures placement, not creative. How to A/B test a QR campaign properly: split the destination behind a dynamic code, judge on conversion rate, and be honest about significance.

ScanKit

ScanKit · Organization

· 14 min read

How to A/B test a QR code campaign without fooling yourself

Every agency has had the meeting where someone insists the blue poster outperformed the orange one, and someone else insists it was the other way round, and nobody has the data to settle it. A/B testing is how you replace that argument with evidence. Run two versions, measure which one wins, keep the winner.

The catch with QR codes is that a printed code is fixed. You cannot quietly show half your audience one version and half the other the way a website does, so the obvious test, two posters with two designs, is also the easiest one to get wrong. This guide shows the testing methods that actually produce a trustworthy result, the ones that only look like tests, and a workflow you can reuse on every client campaign.

It assumes you already have tracking in place. If you do not, set that up first: a code that reports nothing cannot be tested. The companion pieces on tracking QR scans in Google Analytics 4 and which scan metrics actually matter are the foundation this one builds on.

What A/B testing actually means, and the one rule everyone breaks

An A/B test compares two versions of something that differ in exactly one way, shows them to comparable audiences at the same time, and uses the numbers to decide which performs better. Optimizely's own definition stresses the two load-bearing words: variants are shown "at random" and the winner is chosen by "statistical analysis", not by eye.

Two principles follow, and both get broken constantly.

First, change one variable at a time. If version B has a new headline and a new image and a new offer, and it wins, you have learned nothing you can reuse, because you do not know which change did the work. Test the headline, or the image, or the offer. One thing.

Second, pick one primary metric before you start, and make it a rate tied to your goal rather than raw volume. "More scans" is a tempting metric and a misleading one, because scan counts mostly measure where you put the code and how busy the spot was, not whether the creative persuaded anyone. The honest metric is a conversion rate: of the people who scanned, what share completed the action you wanted? That is the number a variant can genuinely move.

Why a printed QR code breaks the textbook test

Here is the problem that makes QR testing different from web testing. A valid A/B test needs random assignment: each person has an equal chance of seeing either variant, so the two groups are otherwise identical. A website does this server-side on every visit. A printed poster cannot. Whoever walks past sees whatever was printed.

So the moment you test two designs by putting poster A in one location and poster B in another, you have confounded the design with the placement. Location B might win because its creative was better, or because it sat at eye level by a queue while A was above a doorway nobody waited under. Footfall, dwell time, lighting, height, weather, and the local audience all differ between two physical spots, and the test cannot separate any of them from the thing you meant to measure. You get a number, but it does not mean what you think it means.

This is not a reason to give up on testing QR campaigns. It is a reason to test the part you can randomise, and to be honest about the part you cannot.

The clean method: test the destination, not the code

The breakthrough is to stop trying to test the physical artefact and test what happens after the scan instead. With a dynamic QR code, the printed code encodes only a short redirect URL. The real destination lives on the server and is yours to control. That is where a clean test becomes possible.

The gold standard is server-side traffic splitting: a single printed code whose redirect sends a random share of scans to landing page A and the rest to landing page B. Because the same stream of scanners is divided at the moment of the redirect, the two groups are genuinely random and exposed at the same time. This is the only QR A/B test that satisfies the textbook conditions, and it is the one to reach for when your tooling supports it.

Diagram: one dynamic QR code whose redirect splits scans between two landing pages, A and B, with the winner measured by conversion rate.
A clean QR A/B test: one dynamic code (1), a redirect that splits the scan stream (2), two destination variants (3), and the conversion rate that picks the winner (4).

The diagram above shows the shape of it:

  1. One dynamic code, the same printed code for everyone, so placement is identical for both variants.
  2. The redirect, where the scan stream is split between the two destinations.
  3. Two destinations that differ in one thing: page A and page B.
  4. The measurement, where you compare the conversion rate of each and keep the winner.

Tag each destination with a distinct utm_content value so Google Analytics reports the two variants on separate rows. Use a descriptive value such as hero-offer versus hero-testimonial, not a bare "a" and "b" you will not recognise in three weeks. The utm_content parameter is the standard place to label A/B variants, but remember it only separates the variants in your reports; the splitting itself happens in the redirect.

A note on honesty about your tools. Not every dynamic-code setup performs an automatic random split. ScanKit, for instance, points each code at a single destination you control, which you can edit at any time. That does not give you a one-click random splitter, but it gives you something almost as useful for offline campaigns: the ability to swap the destination instantly and roll out a winner with no reprint, which is exactly what the next two methods rely on.

When you cannot split: sequential and matched-placement tests

If you cannot divide a single code's traffic randomly, you have two fallbacks. Both are weaker, and it is worth knowing exactly how.

A sequential test runs version A for a period, then switches the same code's destination to version B for an equal period, and compares. Because the destination is editable on a dynamic code, this needs no reprint, you just change where the code points when the second phase begins. The weakness is time itself: anything that changed between the two periods is now mixed into the result. A payday week, a heatwave, a holiday, a competitor's promotion, the weekday-versus-weekend mix, all of it confounds the comparison. Run each phase for whole weeks and avoid spanning a holiday, and treat the result as a strong hint rather than proof.

A matched-placement test uses two different printed codes in two locations, but chosen so the locations are as similar as you can make them: the same store layout, mirrored fixtures, comparable footfall. To cancel out the placement bias that remains, swap the variants between the locations halfway through, so each design spends equal time in each spot. This is better than a naive two-poster test, but it is still quasi-experimental. Report it as directional, not definitive.

Sample size and significance, without the hand-waving

This is where offline testing demands honesty, because the numbers are usually small. Statistical significance, typically set at a 95 per cent confidence level, is the bar that says a difference is unlikely to be down to chance. You decide that bar, and the sample size you need to reach it, before the test starts.

The single most common way to lie to yourself is peeking. If you watch the test and declare a winner the instant it first crosses 95 per cent, you have not run a 95 per cent test. Evan Miller's well-known analysis shows that stopping at the first significant moment pushes the real false-positive rate to around 26 per cent, more than five times what you assumed. The discipline is dull but decisive: fix the sample size and the end date up front, and only read the result when you get there.

A cheap sanity check is the A/A test: run two identical variants against each other. At a 95 per cent threshold, a properly behaving setup should call a "winner" only about 5 per cent of the time, by pure chance. If your tool crowns one far more often, something is wrong before you have tested anything real.

Now the uncomfortable part. Most printed QR campaigns simply do not generate enough scans, let alone enough conversions, to reach 95 per cent significance. That is not a failure of your method; it is the reality of offline volume. When that is the case, say so. Decide the sample size in advance, run the test honestly, and present a low-volume result as directional evidence rather than dressing it up as statistically proven. Correct and modest beats impressive and wrong, and clients remember which one you gave them. Resist the temptation to quote a tidy "this lifted conversions by 23 per cent" when the sample cannot support it.

The confounders that quietly ruin QR tests

Even a clean split can be spoiled by the things happening around it. Four to guard against:

  • Novelty effect. A new design can lift results simply because it is new, an effect that fades as people get used to it. Run long enough to see past the bump, commonly several weeks, and watch the winner for a couple of weeks after you roll it out to confirm the gain holds.
  • Seasonality and day of week. Traffic on a Saturday is not the same audience as a Tuesday. Run in multiples of whole weeks so the weekday and weekend mix is balanced, and avoid windows that straddle a holiday or a sale.
  • Placement differences. Covered above, and worth repeating because it is the QR-specific trap: if two codes sit in two different spots, you may be measuring the spot, not the creative.
  • Too many tests at once. Overlapping tests interact and muddy each other. Isolate them, or stagger them, so each result means one thing.

A repeatable A/B workflow for client campaigns

Put it together into something you can run the same way every time, in any client's dedicated workspace:

  1. Write the hypothesis. "Changing the landing-page headline to lead with the discount will raise sign-ups, because the offer is the reason people scanned." A real prediction, with a reason.
  2. Change one variable. Build page A and page B that differ in that one thing and nothing else.
  3. Choose the primary metric. One conversion rate, decided now, not after you see the data.
  4. Pick the split method. A randomised destination split if your tooling allows it; otherwise a sequential or matched-placement test, with its limits acknowledged in the brief.
  5. Set the sample size and run length up front. Whole weeks, long enough to outlast novelty, and no peeking.
  6. Read the result once, at the end. Against your pre-set confidence bar. Be honest if the volume only supports a directional call.
  7. Roll out the winner. Because it is a dynamic code, repoint the live destination to the winning page with no reprint, then monitor for a couple of weeks to confirm the lift is real and not novelty.

The payoff compounds. Each test that changes one variable teaches you something reusable about that client's audience, and the dynamic destination means acting on what you learn costs nothing in print. What you stop doing is just as valuable: chasing the things that feel like they should work. If you are still trying to lift raw scans, the honest guide to getting more scans pairs well with this one, because testing tells you which of those tactics actually moved your client's numbers.

Frequently asked questions

Can you A/B test a printed QR code?

Yes, but not by testing the printed code itself. A printed code is fixed, so you cannot randomly assign which version a passer-by sees. The reliable approach is to use one dynamic code and test the destination behind it, splitting scans between two landing pages server-side. If your tool cannot split traffic, you fall back to sequential or matched-placement tests, which are weaker but still useful when treated as directional.

What should I A/B test on a QR campaign?

Test the destination, because it is the part you can change after printing and the part that does most of the persuading. Good candidates are the landing-page headline, the offer, the call to action, and the page layout. Change one of them at a time. Testing the physical poster's design across different locations is unreliable, because placement and footfall confound the result.

What metric should I use to judge a QR A/B test?

Use a conversion rate: the share of people who scanned and then completed your goal, such as a sign-up or purchase. Do not judge on raw scan count, because scans mostly reflect where the code is placed and how busy the spot is, not whether the variant worked. Pick one primary metric before the test starts.

How long should a QR code A/B test run?

Long enough to reach your pre-decided sample size, in multiples of whole weeks so the weekday and weekend mix is balanced, and long enough to outlast the novelty effect, which often means several weeks. Decide the end date in advance and read the result only when you reach it, rather than stopping the moment it looks like a winner.

How many scans do I need for a statistically significant result?

There is no single number; it depends on your baseline conversion rate and the size of the difference you want to detect, so use a sample-size calculator before launching. Be realistic: most printed campaigns do not generate enough scans to reach 95 per cent significance, so plan for a directional result and say so honestly rather than overclaiming.

Do I need a dynamic QR code to run an A/B test?

In practice, yes. A static code's destination is baked into the print, so you cannot split traffic, swap a losing variant, or roll out a winner without reprinting. A dynamic code's editable destination is what makes every method here possible, from a server-side split to a sequential test to an instant winner roll-out.

What is utm_content used for in QR testing?

It labels each variant so Google Analytics reports them on separate rows. Give each version a descriptive value, such as hero-offer and hero-testimonial, rather than a bare "a" and "b". It separates the variants in your reporting, but it does not split the traffic itself; the split happens in the redirect.

What is an A/A test and why would I run one?

An A/A test runs two identical variants against each other. Since they are the same, neither should win, yet at a 95 per cent confidence level a healthy setup will still flag a false "winner" about 5 per cent of the time by chance. If your tool declares a winner far more often than that, the setup or the tool is faulty, and it is better to learn that before you trust it on a real test.

The short version

A printed QR code cannot be randomised the way a web page can, so the naive "two posters, two designs" test measures placement as much as creative. Test the part you can control instead: the destination behind a dynamic code.

The clean method is a single code whose redirect splits scans between landing page A and page B, with a distinct utm_content on each so analytics can tell them apart. If you cannot split traffic, run a sequential or matched-placement test and report it as directional. Judge on a conversion rate, not raw scans. Decide your sample size and end date up front, never peek and call a winner early, and stay honest when offline volume is too small to prove significance.

Then roll out the winner by repointing the dynamic destination, with no reprint, and watch it for a fortnight to be sure the lift is real. Write the hypothesis, change one thing, measure one rate, and let the data settle the argument the next time the room is split over blue versus orange.

Share

Keep reading