Universal Web Snapshot - HTML, Text, Markdown Capture
Pricing
$50.00 / 1,000 snapshot takens
Universal Web Snapshot - HTML, Text, Markdown Capture
Capture a clean HTML, plain-text, or markdown snapshot of any URL. Built for archival, change detection, and downstream LLM input. Stores rendered title, final URL, status, fetched-at timestamp. $0.003 per snapshot. Free preview run.
Pricing
$50.00 / 1,000 snapshot takens
Rating
0.0
(0)
Developer
Emily Ward
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
8 hours ago
Last modified
Categories
Share
Universal Web Snapshot
Snapshot any URL via a connector chain: static HTML first, Playwright browser if static returns thin content, Wayback Machine if both fail.
Pure scraper. No API keys. No LLM dependency. Designed to be robust against the three most common scraping failure modes:
| Failure mode | Tier that handles it |
|---|---|
| Site returns HTML but it's a JS-only SPA (Notion, Linear, Figma) | Playwright browser |
| Site blocks datacenter IPs | Apify proxy routing |
| Site is down right now but has Wayback snapshots | Wayback Machine |
What you get back per URL
| Field | Description |
|---|---|
input_url, final_url, status | Identity + HTTP response |
tier_used | "static" / "browser" / "wayback" - tells you which connector won |
archived_at | If Wayback was used, the snapshot timestamp |
signals.title, og_title, meta_description, og_description, canonical_url | Standard meta fields |
signals.h1[], h2[] | First N headings (typically your hero copy) |
signals.json_ld[] | Any JSON-LD structured data the page exposes |
text_excerpt (first 2000 chars), text_full (up to 20k), text_length | Cleaned text content |
screenshot_url | Optional PNG (Playwright tier only) saved to actor KV store |
tries[] | Per-tier result so you can see what was attempted |
elapsed_ms | Total processing time |
Pricing
$0.05 per successful snapshot. Failed URLs (all three tiers exhausted) are not charged.
| Use case | URLs | Cost |
|---|---|---|
| Quick competitor sweep | 20 | $1.00 |
| Daily monitoring (100 URLs/day) | 3,000/mo | $150/mo |
| One-off content audit | 500 | $25.00 |
How the connector chain works
1. Static fetch (cheap, fast) ─┐├─ HTTP 200 + > 300 chars │ Done. tier_used="static".└─ Thin/blocked/missing content ─┤│2. Playwright browser (heavier) ─┤├─ Renders JS, waits for network │ Done. tier_used="browser".└─ Browser failure / still thin ─┤│3. Wayback Machine (last resort) ─┤├─ Finds archived snapshot │ Done. tier_used="wayback".└─ No archive available ─┘ Error returned.
The buyer is only charged for successful snapshots, regardless of how many tiers were tried.
Use cases
- Pricing intel: monitor competitor pricing pages even when they are JS-rendered.
- Content audits: snapshot 500 URLs across a domain, get clean text for analysis.
- AI training data: prepare structured input for LLM pipelines from any web source.
- Compliance / legal: archive a copy of pages with a Wayback timestamp for evidence.
- SEO research: extract title + meta + h1/h2 across competitor sites.
- Investor due diligence: snapshot a company's site at a point in time.
Why this is a connector / plugin architecture
The actor's src/lib/scraping.js exposes:
fetchStatic(url, opts)- tier 1 implementationfetchBrowser(url, opts)- tier 2 implementationfetchWayback(url, opts)- tier 3 implementationsmartFetch(url, opts)- the orchestratorcleanHtml(html),extractSignals(html),normalizeUrl(input)- shared utilities
Any future scraper actor can import the same lib. This means a Pricing Watcher v2 (or any other actor that needs robust scraping) gets the same multi-tier fetch for free. The pattern lives once.
What this actor does NOT do
- It does not log into authenticated sites.
- It does not download non-HTML assets (PDFs, videos, etc).
- It does not paginate within a single URL (use a separate crawler for that).
Tags
scraping web-snapshot wayback playwright connector static-fetch headless-chrome content-extraction seo
Made by Emily Ward, Cancel Costs.
Integrations
This actor works out of the box with every Apify-supported integration:
- API: call via Apify API or any official SDK (Python, JavaScript, PHP, .NET). Returns a clean dataset URL.
- Schedule: set a daily, weekly, or custom cron cadence in Apify Console. Combine with
notificationfor fresh feeds. - Webhooks: wire
ACTOR.RUN.SUCCEEDEDto Slack, Discord, Zapier, Make, n8n, Pipedream, or any HTTPS endpoint. - MCP: this actor is discoverable through Apify's hosted MCP server at
mcp.apify.comfor Claude, Cursor, Cline, Windsurf, and other MCP clients. - n8n / Make / Zapier: native HTTP-Request integration. Trigger the actor on schedule, pipe results to Google Sheets, Airtable, your CRM, or any database.
Try it free
Every Apify user gets $5/month in free platform credits (around 250 events at this actor's per-event price). Run preview mode first to confirm output shape before scaling.
New to Apify? Sign up here to get free credits on signup.
What's New
- 2026-06-03: Metadata, categories, and SEO refreshed. Latest version live on Apify Store.
Last Updated
2026-06-03