Skip to content
  • There are no suggestions because the search field is empty.

Common questions about CPP A/B Testing

  • What does “desired precision” mean, and how does it affect the test?

“Desired precision” refers to the margin of error you’re willing to accept in the test results, particularly for KPIs like conversion rate (CR) and tap-through rate (TTR). A 1% precision means the results will be highly accurate, while a 5% precision indicates a wider margin of error and less accuracy. We cap the maximum precision at 5% to ensure the reliability of the test results.

The default is set to 1%, which yields the most precise results, but you can increase it up to 5% to reduce test duration. Lower precision means the system needs more data (i.e., more traffic and longer test duration) to reach a conclusion. For example, if a variant has a CR of 25% with a 3% precision setting, the true CR is likely between 22% and 28%.

You can also shorten test duration by reducing the number of variants or choosing ad groups with higher traffic in the last 30 days.

  • How does the system ensure equal exposure for each variant in a Switch test?

The system analyzes traffic patterns over the past month, looking at fluctuations and identifying high and low traffic days, to determine an optimal switching mechanism. It then balances exposure by adjusting the switching times so each variant benefits from a similar mix of traffic conditions.

When traffic levels and variability are within acceptable thresholds, the system may allow more frequent switching (e.g., hourly or daily), which helps reduce the overall test duration while maintaining fairness in exposure. Click here to learn more about the calculations.

  • What’s the minimum amount of traffic needed for a test to be statistically valid?

There isn’t a fixed number; instead, the system calculates the required test duration based on several factors: past traffic volumes of the selected ad groups, number of variants, the 90% confidence level, desired precision, and traffic fluctuations.

The goal is to ensure each variant receives enough traffic to detect meaningful performance differences and reach statistically reliable conclusions.

  • How is the 90% confidence level determined, and why is it important?

A 90% confidence level means that if the same test were run 10 times under identical conditions, the results would be consistent at least 9 out of those 10 times.

This level balances accuracy with practicality, higher confidence levels require significantly more traffic and longer durations. We fixed 90% as the standard to ensure reliable results without excessive testing time. This approach aligns with industry standards, such as Apple’s use of 90% confidence in product page optimization (PPO) tests.

  • What should I do if the test ends with no significant difference between variants?

If no variant clearly outperforms the others, you can still review metrics like impressions, CR, and CTR to choose the most promising option.

A lack of significant difference indicates that the variants are likely to perform similarly over time. The test doesn’t guarantee one variant will outperform others, it simply ensures that, given your selected precision and confidence level, the results are statistically sound and comparable.

  • How do I choose between “switching ad groups” and “switching ads” for a test?

Both methods test multiple custom product pages, but they differ in structure:

  • Switch Ads: The original ad group is used, and multiple ads are created within it, each linked to a different custom product pages. The system then rotates between these ads. Since one ad group is used in the "Switch Ads" method, you have the option to keep ongoing automated optimization running.
  • Switch Ad Groups: The original ad group is duplicated for each variant (along with the original). Each duplicated group is assigned one custom product pages, and the system enables/disables ad groups to rotate the variants. Automated optimizations are disabled during the test period.

The core logic of the test remains the same, so your choice may depend on your setup preferences or campaign management needs.

  • How does traffic stabilization work?

When multiple ad group variants run in parallel, Apple’s system automatically distributes traffic among them. However, since all variants share the same keywords, Apple may favor one variant and give it most of the traffic, leading to biased or misleading results.

To prevent this, users can enable “Stabilize Traffic.” Whe it’s enabled, the system monitors each variant’s traffic hourly. If one variant starts receiving disproportionately high traffic, it is temporarily paused to let others catch up. The system ensures that the difference in traffic distribution between the variant receiving the most traffic and the one receiving the least does not exceed 25%. This continuous monitoring ensures fair traffic distribution with minimal status changes.

  • Do I lose traffic when running CPP A/B Tests?

Some traffic loss is possible and expected during A/B testing. Because the system switches ad group statuses and Apple needs time to reflect these updates, you may observe a temporary dip in total traffic.

  • Can I create an A/B test for an ad group that I recently created?

No. To generate statistically valid results, the system requires historical data from the ad group. We use the previous month’s traffic as a benchmark to estimate the required test duration and traffic volume. The benchmark helps calculate the confidence level and precision of the results. In short, an ad group must have sufficient past data for the system to determine accurate and reliable outcomes. 

  • How do I shorten the test duration?

The total duration of a custom product page A/B test depends on your traffic volume, number of variants, and selected precision. To shorten your test while maintaining accuracy, you can:

    • Increase desired precision: Raising the margin of error from 1% to 3–5% reduces the amount of data required. This shortens the test but slightly reduces accuracy.
    • Use high-traffic ad groups:Select ad groups with higher daily taps and installs to reach the required sample size faster. 
    • Test fewer variants:Running two variants instead of four significantly decreases required traffic and test time.
    • Choose a longer switch interval for Switch tests:Daily or weekly switches help balance exposure faster in lower-traffic cases.
    • Ensure stable traffic: Fluctuating traffic extends the test duration since the system waits to capture data across consistent conditions.
  • What might affect the health of the test in a negative manner?

Actions that might affect the test health are: bid changes, budget changes and status changes of ad groups, campaigns and KWs. Since custom product pages are directly related to screenshots, custom product page assignments should be stable and screenshots should not be changed. Description changes might also affect the performance of the ad group positively or negatively, so for a healthy test result, it should also be stable during the test.