Universal Web Snapshot - HTML, Text, Markdown Capture avatar

Universal Web Snapshot - HTML, Text, Markdown Capture

Pricing

$50.00 / 1,000 snapshot takens

Go to Apify Store
Universal Web Snapshot - HTML, Text, Markdown Capture

Universal Web Snapshot - HTML, Text, Markdown Capture

Capture a clean HTML, plain-text, or markdown snapshot of any URL. Built for archival, change detection, and downstream LLM input. Stores rendered title, final URL, status, fetched-at timestamp. $0.003 per snapshot. Free preview run.

Pricing

$50.00 / 1,000 snapshot takens

Rating

0.0

(0)

Developer

Emily Ward

Emily Ward

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

8 hours ago

Last modified

Share

Universal Web Snapshot

Snapshot any URL via a connector chain: static HTML first, Playwright browser if static returns thin content, Wayback Machine if both fail.

Pure scraper. No API keys. No LLM dependency. Designed to be robust against the three most common scraping failure modes:

Failure modeTier that handles it
Site returns HTML but it's a JS-only SPA (Notion, Linear, Figma)Playwright browser
Site blocks datacenter IPsApify proxy routing
Site is down right now but has Wayback snapshotsWayback Machine

What you get back per URL

FieldDescription
input_url, final_url, statusIdentity + HTTP response
tier_used"static" / "browser" / "wayback" - tells you which connector won
archived_atIf Wayback was used, the snapshot timestamp
signals.title, og_title, meta_description, og_description, canonical_urlStandard meta fields
signals.h1[], h2[]First N headings (typically your hero copy)
signals.json_ld[]Any JSON-LD structured data the page exposes
text_excerpt (first 2000 chars), text_full (up to 20k), text_lengthCleaned text content
screenshot_urlOptional PNG (Playwright tier only) saved to actor KV store
tries[]Per-tier result so you can see what was attempted
elapsed_msTotal processing time

Pricing

$0.05 per successful snapshot. Failed URLs (all three tiers exhausted) are not charged.

Use caseURLsCost
Quick competitor sweep20$1.00
Daily monitoring (100 URLs/day)3,000/mo$150/mo
One-off content audit500$25.00

How the connector chain works

1. Static fetch (cheap, fast) ─┐
├─ HTTP 200 + > 300 chars │ Done. tier_used="static".
└─ Thin/blocked/missing content ─┤
2. Playwright browser (heavier) ─┤
├─ Renders JS, waits for network │ Done. tier_used="browser".
└─ Browser failure / still thin ─┤
3. Wayback Machine (last resort) ─┤
├─ Finds archived snapshot │ Done. tier_used="wayback".
└─ No archive available ─┘ Error returned.

The buyer is only charged for successful snapshots, regardless of how many tiers were tried.

Use cases

  • Pricing intel: monitor competitor pricing pages even when they are JS-rendered.
  • Content audits: snapshot 500 URLs across a domain, get clean text for analysis.
  • AI training data: prepare structured input for LLM pipelines from any web source.
  • Compliance / legal: archive a copy of pages with a Wayback timestamp for evidence.
  • SEO research: extract title + meta + h1/h2 across competitor sites.
  • Investor due diligence: snapshot a company's site at a point in time.

Why this is a connector / plugin architecture

The actor's src/lib/scraping.js exposes:

  • fetchStatic(url, opts) - tier 1 implementation
  • fetchBrowser(url, opts) - tier 2 implementation
  • fetchWayback(url, opts) - tier 3 implementation
  • smartFetch(url, opts) - the orchestrator
  • cleanHtml(html), extractSignals(html), normalizeUrl(input) - shared utilities

Any future scraper actor can import the same lib. This means a Pricing Watcher v2 (or any other actor that needs robust scraping) gets the same multi-tier fetch for free. The pattern lives once.

What this actor does NOT do

  • It does not log into authenticated sites.
  • It does not download non-HTML assets (PDFs, videos, etc).
  • It does not paginate within a single URL (use a separate crawler for that).

Tags

scraping web-snapshot wayback playwright connector static-fetch headless-chrome content-extraction seo


Made by Emily Ward, Cancel Costs.

Integrations

This actor works out of the box with every Apify-supported integration:

  • API: call via Apify API or any official SDK (Python, JavaScript, PHP, .NET). Returns a clean dataset URL.
  • Schedule: set a daily, weekly, or custom cron cadence in Apify Console. Combine with notification for fresh feeds.
  • Webhooks: wire ACTOR.RUN.SUCCEEDED to Slack, Discord, Zapier, Make, n8n, Pipedream, or any HTTPS endpoint.
  • MCP: this actor is discoverable through Apify's hosted MCP server at mcp.apify.com for Claude, Cursor, Cline, Windsurf, and other MCP clients.
  • n8n / Make / Zapier: native HTTP-Request integration. Trigger the actor on schedule, pipe results to Google Sheets, Airtable, your CRM, or any database.

Try it free

Every Apify user gets $5/month in free platform credits (around 250 events at this actor's per-event price). Run preview mode first to confirm output shape before scaling.

New to Apify? Sign up here to get free credits on signup.

What's New

  • 2026-06-03: Metadata, categories, and SEO refreshed. Latest version live on Apify Store.

Last Updated

2026-06-03