Website Screenshot — Full Pages, Any Resolution, PNG, No Limits avatar

Website Screenshot — Full Pages, Any Resolution, PNG, No Limits

Pricing

Pay per usage

Go to Apify Store
Website Screenshot — Full Pages, Any Resolution, PNG, No Limits

Website Screenshot — Full Pages, Any Resolution, PNG, No Limits

20 runs. Website screenshots as PNG/JPG/PDF in 2 min — full-page, desktop + mobile, custom viewport, bulk URL input. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For competitor visual tracking + UX research. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex

Alex

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

0

Monthly active users

10 days ago

Last modified

Categories

Share

Website Screenshot Scraper — Playwright PNG/JPEG Capture, Custom Viewport

Capture batches of webpage screenshots — full-page or viewport-only, PNG or JPEG, custom width/height — to an Apify key-value store. Zero local browser install, zero Playwright boilerplate.

Headless Chromium (via Playwright) loads the URL with domcontentloaded waiting strategy, optionally waits for a CSS selector, captures the screenshot, stores it in the run's key-value store, and pushes one dataset record per URL with the signed image URL plus capture metadata.


What you actually get (verified against src/main.js)

Output schema — one record per URL

{
"url": "https://stripe.com",
"title": "Stripe | Financial Infrastructure for the Internet",
"screenshotKey": "screenshot_stripe_com_1714398900000",
"screenshotUrl": "https://api.apify.com/v2/key-value-stores/<storeId>/records/<screenshotKey>",
"format": "png",
"width": 1280,
"height": 720,
"fullPage": false,
"fileSize": 286410,
"scrapedAt": "2026-04-29T12:00:00.000Z"
}

10 fields per success record. On error, the actor pushes { url, error: "<reason>", scrapedAt } so your downstream pipeline can retry the failures selectively.

screenshotUrl points to the file inside the Apify run's default key-value store. The store is retained per Apify's plan defaults (typically 14 days on free tier; longer on paid). For permanent retention, post-process the run via Apify webhook → S3 / R2 / your own object store.


Input (full schema, all 8 fields exposed in UI)

ParameterTypeDefaultRangeDescription
urlsarray[]required, ≥1List of URLs. Plain hostnames (stripe.com) get https:// prepended automatically.
fullPagebooleanfalsetrue for entire scroll height; false for viewport-only.
widthinteger1280320–3840Browser viewport width in pixels.
heightinteger720240–2160Browser viewport height in pixels.
formatstring"png""png" | "jpeg"Output format.
qualityinteger801–100JPEG quality. Ignored when format="png".
waitForSelectorstring""CSS selectorOptional CSS selector — actor waits up to 10 s for it before capturing. Empty = skip.
waitTimeinteger20000–30000 msExtra delay after load before capture. Note: a hardcoded 2000 ms settle ALSO runs before this — total minimum settle = 2000 + waitTime (default total = 4000 ms).

How it works

  1. Launch headless Chromium (Playwright chromium.launch({ headless: true })).
  2. New browser context with the requested width × height viewport.
  3. For each URL:
    • page.goto(url, { waitUntil: 'domcontentloaded', timeout: 45000 }).
    • page.waitForLoadState('load', { timeout: 15000 }) — best-effort, swallows timeout.
    • page.waitForTimeout(2000) — settles late-paint elements.
    • If waitForSelector: page.waitForSelector(selector, { timeout: 10000 }), swallows timeout.
    • Additional page.waitForTimeout(waitTime) if waitTime > 0.
    • page.screenshot({ fullPage, type: format, quality? }).
    • Save buffer to KV store as screenshot_<domain>_<epochMs>.
    • Push one dataset record.

Why not networkidle? Pages with persistent SSE / WebSockets / live analytics (Stripe, Linear, Vercel) never reach networkidle — Playwright would time out on them. The actor explicitly uses domcontentloaded + a soft load wait + a fixed waitTime to handle late-paint reliably.


Honest limitations (read before bulk runs)

  • Total minimum settle = 2000 ms + waitTime. There's a hardcoded page.waitForTimeout(2000) AFTER domcontentloaded and BEFORE the configurable waitTime. Default total is 2000 + 2000 = 4000 ms of fixed delay per URL on top of network time. For very-fast pages this is overkill; for very-slow SPA shells it may still be too short — adjust waitTime.
  • Single browser context, sequential URL processing. The actor opens ONE Chromium context and processes URLs in a for loop. 100 URLs × ~6 s wall-clock each ≈ 10 min. No parallelism.
  • One outer try/catch wraps browser launch only — per-URL errors are caught. If a single URL fails (timeout, DNS, navigation error), the actor pushes { url, error, scrapedAt } and CONTINUES to the next URL. However, if the browser itself crashes mid-batch, the whole run aborts (no auto-relaunch).
  • Cloudflare Turnstile / hCaptcha / anti-bot walls block the actor. Standard headless Chromium fingerprint — no stealth plugins. Cloudflare will challenge or block; expect either an error record or a screenshot of the challenge page.
  • No login / cookie injection. Fresh browser context per run. Pages behind auth render their pre-login state. Login-walled captures = custom build.
  • No element-crop, no auto-scroll for lazy-load images, no banner-dismissal heuristics. Full-page captures of cookie-banner-heavy sites will show the banner overlay. Custom build can dismiss common banners (Cookiebot, OneTrust, Quantcast).
  • No proxy. Direct browser launch on Apify worker IP. Geo-restricted pages render with worker's region (typically US/EU).
  • Screenshot retention is per-Apify-plan default — typically 14 days on free tier, longer on paid. For permanent retention, copy via webhook to your own S3 / R2 / Backblaze.
  • Filename is timestamp-keyed screenshot_<domain>_<epochMs> — repeated captures of the same URL produce DIFFERENT keys (no overwrite). Useful for archival; means key-value store grows unbounded — manage retention yourself.
  • title is page document.title AT capture time — for SPAs the title may still be the shell's default if hydration hasn't completed within the 4 s settle window.
  • waitForSelector timeout (10 s) is silent — if the selector never appears, the actor proceeds with the current page state (caught .catch(() => {})).
  • urls = [] is silently accepted — actor exits without pushing any records (only browser launch logs).

Who buys this actor

  • Visual-regression QA engineers running nightly screenshot diffs against staging + prod to catch CSS regressions before users do.
  • Competitive-intel teams archiving weekly snapshots of competitor landing pages (pricing, feature lists, hero copy) for deal-review decks.
  • Content archival / journalism preserving webpage state for takedown resilience (source of truth when a page later changes or 404s).
  • Link-preview / OG-image fallback services generating thumbnail cards for social feeds when the upstream page lacks proper og:image tags.
  • Brand / trademark monitoring capturing how your logo or copy is displayed on partner, affiliate, and unauthorized resale sites.
  • MCP / LLM-agent tools giving an agent the ability to "see" a webpage when DOM-only context isn't enough.

Python example — visual-regression diff

Capture the same set of paths twice (staging + prod) and flag any byte-size delta >5%:

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
pages = ["/", "/pricing", "/docs", "/blog", "/login"]
def capture(base_url: str) -> dict[str, dict]:
run = client.actor("knotless_cadence/website-screenshot-scraper").call(run_input={
"urls": [base_url + p for p in pages],
"fullPage": True,
"width": 1440,
"height": 900,
"format": "png",
})
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
return {i["url"].replace(base_url, ""): i for i in items}
staging = capture("https://staging.example.com")
prod = capture("https://www.example.com")
for path in pages:
s = staging.get(path, {}).get("fileSize", 0)
p = prod.get(path, {}).get("fileSize", 0)
if p and abs(s - p) / p > 0.05:
print(f"⚠ {path} diff {((s-p)/p)*100:+.1f}% staging={s}B prod={p}B")
print(f" {staging[path]['screenshotUrl']}")
print(f" {prod[path]['screenshotUrl']}")

MCP / LLM-agent integration

tools = [{
"name": "capture_webpage",
"description": "Take a screenshot of a webpage and return the image URL.",
"input_schema": {
"type": "object",
"properties": {
"url": {"type": "string"},
"fullPage": {"type": "boolean", "default": False},
},
"required": ["url"],
},
}]
def capture_webpage(url: str, full_page: bool = False) -> str:
run = client.actor("knotless_cadence/website-screenshot-scraper").call(run_input={
"urls": [url], "fullPage": full_page, "format": "png",
})
return list(client.dataset(run["defaultDatasetId"]).iterate_items())[0]["screenshotUrl"]

Pair with Claude Vision / GPT-4o for accessibility audits, brand-compliance checks, or end-to-end QA that tests "looks right" not just "DOM matches".


Common questions

Q: Can I capture a specific element instead of the whole page? A: Not in this actor. Workaround: capture full page, crop locally with Pillow / sharp using the element's bounding box from a companion DOM query. Available as a custom build (see Custom scraping below).

Q: How do I get screenshots at multiple breakpoints (320, 768, 1440 px) in one run? A: Call the actor 3 times with different width. Native multi-viewport input is on the roadmap but not implemented yet.

Q: What about pages behind a login wall? A: Not supported in v1.0 — the actor uses a fresh browser context per run with no cookie / session injection. Custom build with cookie / session-token injection available on request.

Q: Does this bypass Cloudflare or captchas? A: No. Standard headless Chromium fingerprint. Aggressive bot-protection (Cloudflare Turnstile, hCaptcha) will block the actor.

Q: Can I schedule this nightly? A: Yes — Apify has native cron scheduling. Set the actor to run daily, pipe the output dataset to your webhook / Slack / S3 sync.

Q: How long do screenshots stay accessible? A: Per Apify plan defaults — typically 14 days on free tier, longer on paid. For permanent retention, copy PNGs to your own S3 / R2 / Backblaze bucket via Apify webhook or a post-run script.


Visual / monitoring toolkit (companion actors)

ToolPurpose
Website Screenshot Scraper (this)Capture any page visually
Website Uptime CheckerMonitor availability / latency
Broken Links CheckerFind 404s on your site
PageSpeed Insights ScraperLighthouse / Core Web Vitals
HTTP Headers CheckerSecurity-headers audit
Webpage Text ExtractorClean article text from HTML
URL ExpanderResolve shortlink chains

All 31 published actors free to inspect on Apify Store.


Custom scraping — pilot tiers

Need element-crop, multi-viewport, login-walled captures, or a different schema (visual-diff metric, OCR'd text-overlay, automatic banner dismissal)? Three tiers:

  • Pilot — $97 · 1 actor, basic config, 7-day support. Good entry point — useful for a single visual-regression pipeline or a one-off competitor archival sweep.
  • Standard — $297 · custom actor + Slack/email alerts on results, 30-day support. Most QA-automation and competitive-intel projects fit here.
  • Premium — $797 · custom actor + dashboard + 90-day support + 1 modification round. For ongoing pipelines (daily multi-breakpoint capture, brand-monitoring rollups).

Email: spinov001@gmail.com — drop the URL list and the schema you need; quote within 48h.

Proof of work: 31 published Apify scrapers (78 total in portfolio) — Trustpilot 949 runs, Reddit 80+, Google News 43, Glassdoor 37, Email Extractor 36+. Recently delivered a paid 3-article series for a client in the proxy industry ($150).

More tips: t.me/scraping_ai · blog.spinov.online


Disclaimer

Designed for QA, archival, and competitive-research use. Respect target-site Terms of Service, applicable data-protection law (GDPR, CCPA), and capture publicly accessible pages only. Not affiliated with any of the example domains shown.

Honest disclosure: 10 output fields per success record (url, title, screenshotKey, screenshotUrl, format, width, height, fullPage, fileSize, scrapedAt). All 8 input fields now exposed in INPUT_SCHEMA (UI form). Total minimum settle = 2000 ms hardcoded + waitTime (default total = 4000 ms). Sequential processing — no parallelism. Per-URL errors push an error record and continue; browser-crash aborts run. No element-crop, no cookie / session injection, no auto-scroll for lazy-load, no Cloudflare / captcha bypass, no proxy. Wait strategy is domcontentloaded + soft load + fixed waitTimenetworkidle is intentionally avoided because it hangs on SSE / WebSocket sites.