๐Ÿงช A/B Test Significance Calculator โ€” Google Optimize Alt avatar

๐Ÿงช A/B Test Significance Calculator โ€” Google Optimize Alt

Pricing

Pay per usage

Go to Apify Store
๐Ÿงช A/B Test Significance Calculator โ€” Google Optimize Alt

๐Ÿงช A/B Test Significance Calculator โ€” Google Optimize Alt

Drop-in Google Optimize replacement. Calculate p-values, confidence intervals, lift, sample sizes, and winners for A/B tests. Two-proportion z-test + Welch's t-test + Bonferroni multi-variant. API-first, no account, $0.007/test.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Stephan Corbeil

Stephan Corbeil

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

A/B Test Significance Calculator โ€” Google Optimize Replacement

Drop-in replacement for Google Optimize's statistical significance workflow. Get p-values, confidence intervals, lift, statistical power, and required sample sizes โ€” via API, no browser, no account, no monthly fee.

Why this exists: Google Optimize was shut down on September 30, 2023. Google pushed users toward paid enterprise solutions (GA4 + Optimize 360 partners, VWO+AB Tasty starting at $250+/mo). Open-source alternatives like GrowthBook and PostHog require self-hosting. The simple "paste variant data, get p-value" workflow that 80% of SMB users actually needed was never replaced.

This actor does that one thing, exceptionally well. Built for marketers, product managers, engineers, and data scientists who need answer "did my test win?" on demand โ€” inside Zapier workflows, CI/CD pipelines, Slack bots, or notebooks.

๐Ÿ”‘ Features

  • Conversion-rate tests โ€” two-sample z-test for proportions (signup rate, purchase rate, click-through rate)
  • Continuous metrics โ€” Welch's t-test for unequal variances (revenue per user, session length, page views)
  • Multi-variant support โ€” up to 6 variants with automatic Bonferroni correction for familywise error rate
  • Full output โ€” p-value, 95% CI, absolute + relative lift, statistical power achieved, per-variant stats
  • Sample size calculator โ€” tells you how many visitors per variant you need for a given MDE and power
  • Winner flag + human recommendation โ€” one-line "ship Variant A" or "keep testing" verdict
  • Zero scraping, zero JS โ€” pure Python scipy.stats. Runs in under 2 seconds. Lowest-compute actor in the fleet.

๐Ÿ’ผ Common Use Cases

  • Post-test analysis โ€” drop-in replacement for Google Optimize's significance display
  • Zapier / Make.com workflows โ€” auto-analyze Mixpanel/Amplitude test results nightly
  • CI/CD integration โ€” gate deployments on experimental success
  • Slack bots โ€” /significance variant_a=250/5000 variant_b=310/5000 โ†’ auto-response
  • Notebooks / BI dashboards โ€” trigger via API, render results alongside visualizations
  • Marketing team Slack โ€” daily digest of running test statuses
  • Startup PMs โ€” quickly sanity-check whether the "10% lift" from the latest test is real

๐Ÿ“ฅ Input Example โ€” Conversion Test

{
"metricType": "conversion",
"variants": [
{"name": "Control", "visitors": 5000, "conversions": 250},
{"name": "Variant A (new CTA)", "visitors": 5000, "conversions": 310},
{"name": "Variant B (redesign)", "visitors": 5000, "conversions": 295}
],
"alpha": "0.05",
"power": "0.80",
"minDetectableEffect": "0.05"
}

๐Ÿ“ฅ Input Example โ€” Continuous Metric

{
"metricType": "continuous",
"variants": [
{"name": "Control", "n": 1000, "mean": 42.50, "std": 18.30},
{"name": "Variant A", "n": 1000, "mean": 46.80, "std": 19.10}
],
"alpha": "0.05"
}

๐Ÿ“ค Output

{
"metric_type": "conversion",
"significance_level_alpha": 0.05,
"alpha_bonferroni_corrected": 0.025,
"num_variants": 3,
"num_comparisons": 2,
"control_variant": "Control",
"variants_summary": [
{"name": "Control", "visitors": 5000, "conversions": 250, "conversion_rate": 0.05},
{"name": "Variant A", "visitors": 5000, "conversions": 310, "conversion_rate": 0.062}
],
"comparisons": [
{
"variant": "Variant A (new CTA)",
"vs": "Control",
"significant": true,
"p_value": 0.009234,
"z_score": 2.6078,
"lift_absolute": 0.012,
"lift_relative": 0.24,
"ci_95_lower": 0.003,
"ci_95_upper": 0.021,
"statistical_power": 0.84
}
],
"winner": "Variant A (new CTA)",
"required_sample_size_per_variant": 6162,
"recommendation": "Ship Variant A (new CTA). It beats Control with 24.0% relative lift at p<0.025 (Bonferroni-corrected)."
}

๐Ÿ Python SDK Example

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("nexgendata/ab-test-calculator").call(run_input={
"metricType": "conversion",
"variants": [
{"name": "Control", "visitors": 10000, "conversions": 520},
{"name": "Variant A", "visitors": 10000, "conversions": 605}
]
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print("Winner:", item["winner"])
print("Recommendation:", item["recommendation"])

๐ŸŒ cURL Example

curl -X POST "https://api.apify.com/v2/acts/nexgendata~ab-test-calculator/run-sync-get-dataset-items?token=YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"variants": [
{"name": "Control", "visitors": 5000, "conversions": 250},
{"name": "Variant A", "visitors": 5000, "conversions": 310}
]
}'

๐Ÿ”— Zapier / Make.com Integration

Perfect for automating test analysis. Trigger: "New row in Experiments Google Sheet" โ†’ Action: "Run Actor (A/B Test Calculator)" with variant data โ†’ Output: Post winner + lift to Slack, Notion, or email.

โ“ FAQ

Q: What test is used for conversion metrics? Two-sample z-test for proportions with pooled standard error. Matches Google Optimize's "Relative improvement" calculation and what Evan Miller's calculator uses.

Q: What about continuous metrics (revenue, session length)? Welch's t-test with Welch-Satterthwaite degrees of freedom. Handles unequal variances, which is the realistic case for revenue data.

Q: How are multiple variants handled? All treatments are compared against the first variant (control). Alpha is Bonferroni-corrected by dividing by the number of comparisons to control familywise error rate.

Q: Is this Bayesian or frequentist? Frequentist. This matches Google Optimize's default behavior and what most teams already reason about (p-values, CIs). Bayesian support is on the roadmap.

Q: Does it work for sequential testing / peeking? Not natively โ€” use only once per test, after test concludes. Sequential testing requires different statistical treatment (alpha spending, mSPRT). Coming in v1.1.

Q: How is this different from Evan Miller's free calculator? Same math, but API-first. Automate across many tests instead of copy-pasting into a web form. Great for portfolios of experiments.

๐Ÿ’ฐ Pricing (Pay-Per-Event)

  • Actor start: $0.005
  • Test analyzed: $0.002

Typical run cost: $0.007 per test analyzed. Effectively free for individual use. At 1,000 tests/month you're still under $7.

๐Ÿš€ Apify Affiliate Program

New to Apify? Sign up with our referral link for free platform credits.


A Google Optimize replacement for marketers, PMs, and data scientists who miss the simple workflow. Built by NexGenData.