Pricing

$3.00 / 1,000 snapshot takens

Universal Web Snapshot - HTML, Text, Markdown Capture

Capture a clean HTML, plain-text, or markdown snapshot of any URL. Built for archival, change detection, and downstream LLM input. Stores rendered title, final URL, status, fetched-at timestamp. $0.003 per snapshot. Free preview run.

Pricing

$3.00 / 1,000 snapshot takens

Rating

0.0

(0)

Developer

Emily Ward

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Universal Web Snapshot

Universal Web Snapshot - HTML, Text, Markdown Capture

Snapshot any URL via a connector chain: static HTML first, Playwright browser if static returns thin content, Wayback Machine if both fail.

Pure scraper. No API keys. No LLM dependency. Designed to be robust against the three most common scraping failure modes:

Failure mode	Tier that handles it
Site returns HTML but it's a JS-only SPA (Notion, Linear, Figma)	Playwright browser
Site blocks datacenter IPs	Apify proxy routing
Site is down right now but has Wayback snapshots	Wayback Machine

What you get back per URL

Field	Description
`input_url`, `final_url`, `status`	Identity + HTTP response
`tier_used`	"static" / "browser" / "wayback" - tells you which connector won
`archived_at`	If Wayback was used, the snapshot timestamp
`signals.title`, `og_title`, `meta_description`, `og_description`, `canonical_url`	Standard meta fields
`signals.h1[]`, `h2[]`	First N headings (typically your hero copy)
`signals.json_ld[]`	Any JSON-LD structured data the page exposes
`text_excerpt` (first 2000 chars), `text_full` (up to 20k), `text_length`	Cleaned text content
`screenshot_url`	Optional PNG (Playwright tier only) saved to actor KV store
`tries[]`	Per-tier result so you can see what was attempted
`elapsed_ms`	Total processing time

Pricing

$0.05 per successful snapshot. Failed URLs (all three tiers exhausted) are not charged.

Use case	URLs	Cost
Quick competitor sweep	20	$1.00
Daily monitoring (100 URLs/day)	3,000/mo	$150/mo
One-off content audit	500	$25.00

How the connector chain works

1. Static fetch (cheap, fast)        ─┐
   ├─ HTTP 200 + > 300 chars         │ Done. tier_used="static".
   └─ Thin/blocked/missing content   ─┤
                                      │
2. Playwright browser (heavier)      ─┤
   ├─ Renders JS, waits for network  │ Done. tier_used="browser".
   └─ Browser failure / still thin   ─┤
                                      │
3. Wayback Machine (last resort)     ─┤
   ├─ Finds archived snapshot        │ Done. tier_used="wayback".
   └─ No archive available           ─┘ Error returned.

The buyer is only charged for successful snapshots, regardless of how many tiers were tried.

Use cases

Pricing intel: monitor competitor pricing pages even when they are JS-rendered.
Content audits: snapshot 500 URLs across a domain, get clean text for analysis.
AI training data: prepare structured input for LLM pipelines from any web source.
Compliance / legal: archive a copy of pages with a Wayback timestamp for evidence.
SEO research: extract title + meta + h1/h2 across competitor sites.
Investor due diligence: snapshot a company's site at a point in time.

Why this is a connector / plugin architecture

The actor's src/lib/scraping.js exposes:

fetchStatic(url, opts) - tier 1 implementation
fetchBrowser(url, opts) - tier 2 implementation
fetchWayback(url, opts) - tier 3 implementation
smartFetch(url, opts) - the orchestrator
cleanHtml(html), extractSignals(html), normalizeUrl(input) - shared utilities

Any future scraper actor can import the same lib. This means a Pricing Watcher v2 (or any other actor that needs robust scraping) gets the same multi-tier fetch for free. The pattern lives once.

What this actor does NOT do

It does not log into authenticated sites.
It does not download non-HTML assets (PDFs, videos, etc).
It does not paginate within a single URL (use a separate crawler for that).

Pairs well with

pricing-page-watcher: Snapshot competitor pricing pages, diff over time. $0.005 per check.
shopify-store-detector: Snapshot Shopify pages and extract stack. $0.03 per store.
wordpress-stack-detector: Snapshot WP pages and detect plugins. $0.02 per site.

Integrations

This actor works out of the box with every Apify-supported integration:

API: call via Apify API or any official SDK (Python, JavaScript, PHP, .NET). Returns a clean dataset URL.
Schedule: set a daily, weekly, or custom cron cadence in Apify Console. Combine with notification for fresh feeds.
Webhooks: wire ACTOR.RUN.SUCCEEDED to Slack, Discord, Zapier, Make, n8n, Pipedream, or any HTTPS endpoint.
MCP: this actor is discoverable through Apify's hosted MCP server at mcp.apify.com for Claude, Cursor, Cline, Windsurf, and other MCP clients.
n8n / Make / Zapier: native HTTP-Request integration. Trigger the actor on schedule, pipe results to Google Sheets, Airtable, your CRM, or any database.

Try it free

Every Apify user gets $5/month in free platform credits (around 250 events at this actor's per-event price). Run preview mode first to confirm output shape before scaling.

New to Apify? Sign up here to get free credits on signup.

What's New

2026-06-03: Metadata, categories, and SEO refreshed. Latest version live on Apify Store.

Last Updated

2026-06-03

Wayback Machine Bulk Lookup

jungle_synthesizer/wayback-machine-bulk-lookup

Look up Wayback Machine snapshots for any URL or list of URLs. Returns capture timeline, optional snapshot markdown, and live-vs-snapshot diff. Date range filtering, capture limit, bulk input. Built for OSINT, journalism, SEO link-rot recovery, and legal evidence.

BowTiedRaccoon

Website to Markdown Crawler for LLM & RAG

logiover/website-text-markdown-crawler

Crawl any website to clean Markdown and plain text for LLM training and RAG. HTML to Markdown, no API or login. Export website text to CSV or JSON.

Logiover

Wayback Machine Snapshots Scraper — Internet Archive History

seemuapps/wayback-machine-snapshots-scraper

List every Internet Archive snapshot of a URL, page, or whole domain. Timestamp, snapshot URL, status code, mime type, content length. No login.

Andrew

Document Format Converter — Markdown, HTML & Text

junipr/document-format-converter

Convert Markdown, HTML, plain text, JSON, and CSV-style documents into clean automation-ready formats with downloadable output files.

junipr

HTML to Markdown Converter API — URL or Raw HTML to GFM

eliai/html-to-markdown

Convert HTML to Markdown via API. Input: a URL or raw HTML string. Output: clean GitHub-flavored Markdown with headings, links, lists, tables, and code blocks preserved. Sync run-and-return. Cheap pay-per-result: $0.02 per page converted.

Anthony Snider

Website Change Monitor

agentictools/website-change-monitor

Watch any web pages and detect changes between runs. Stores a snapshot and reports what changed.

Ken Agland

Web Page to Markdown Extractor

fetch_cat/web-page-to-markdown-extractor

Convert public URLs into clean Markdown, text, metadata, links, images, and optional HTML for AI and automation workflows.

Hanna Nosova

Website Content Scraper: Clean Markdown for AI and RAG

scrapemint/website-content-scraper

Crawl any website and get clean markdown, text, or HTML per page, ready for RAG pipelines, chatbots, and LLM fine tuning. Plain HTTP, no browser, no API key. Pay per page.