Website Content Extractor
Pricing
Pay per event
Website Content Extractor
Extract clean text and markdown from docs, pricing, product, policy, and help-center URLs for RAG datasets and content operations.
Pricing
Pay per event
Rating
0.0
(0)
Developer
太郎 山田
Maintained by CommunityActor stats
1
Bookmarked
18
Total users
3
Monthly active users
2 days ago
Last modified
Categories
Share
After this run
Turn this Actor's output into a capped paid report with SaaS Pricing Page Monitor & Competitor Price Change Alerts. Use it when SaaS founders, product marketers, and pricing teams need to decide whether a public competitor pricing page changed in a way that affects packaging or sales messaging.
- First report: $3 /
pricing_snapshot_report; setmaxChargeUsdto $3. - Deeper report: $15 /
competitor_pricing_report; use only when the first result needs competitor or action-depth. - This is an internal Apify flow aid. It is not revenue proof until accounted paid usage appears.
Next report-style Actors
If you already have data from this Actor, these follow-on Actors turn public or user-provided inputs into decision-ready reports. They are optional, capped by maxChargeUsd, and do not make business outcome claims.
- Website RAG Readiness Audit Report - decide whether extracted public pages are clean enough for RAG before embedding.
- SaaS Pricing Page Monitor - turn public pricing pages into competitor pricing action reports.
- Ad Landing Page Offer Intelligence - turn landing pages into CRO offer and proof checklists.
AI builders, content ops, SEO teams, and documentation teams use this actor to turn Public website pages supplied by the user into a clean dataset for Site QA & Content Intelligence Pack. Provide focused source inputs, keep the first run small, and expand only after the output shape is useful. Each emitted row includes source context, timestamps, and fields designed for monitoring, QA, research, or workflow handoff.
Store Quickstart
Start with 5 to 20 URLs from one domain, review extracted markdown quality, then expand to sitemap or scheduled checks.
Recommended first run:
{"urls": ["https://example.com/docs"],"outputFormat": "markdown","limit": 10,"delivery": "dataset","dryRun": false}
Input examples
Docs pages
{"urls": ["https://example.com/docs"],"outputFormat": "markdown","limit": 10,"delivery": "dataset","dryRun": false}
Pricing and product pages
{"urls": ["https://example.com/pricing","https://example.com/product"],"outputFormat": "text","limit": 20,"delivery": "dataset","dryRun": false}
Webhook handoff
{"urls": ["https://example.com/help"],"outputFormat": "markdown","delivery": "webhook","webhookUrl": "https://example.com/webhook","dryRun": false}
Sample output
{"meta": {"actorName": "website-content-extractor","actorTitle": "Website Content Extractor","bundle": "Site QA & Content Intelligence Pack","fetchedAt": "2026-05-06T00:00:00.000Z","totalRows": 1},"rows": [{"actorName": "website-content-extractor","rowType": "web_content","url": "https://example.com/docs","title": "Example Docs","markdown": "# Example Docs\nUseful content.","wordCount": 240,"sourceUrl": "https://example.com/docs","fetchedAt": "2026-05-06T00:00:00.000Z"}],"warnings": []}
Output fields
rowTypeurltitlemarkdowntextwordCountmetadatasourceUrlfetchedAt
Rows also include source URLs, fetch timestamps, warnings when a source is partial, and stable IDs when the workflow supports recurring change detection.
See also (Content extraction cluster)
- Article Content Extractor & Reader Scraper — Article-specific extraction (byline, publish date, hero image) for news/blog/press URLs.
Pricing and no-change runs
$0.001 actor start and $0.009 per useful content row. Failed/no-content rows should stay out of the default dataset.
The default dataset is the billable surface. Dry runs, validation-only runs, missing-key warnings, and unchanged recurring polls should not write payable default-dataset rows.
Compliance guardrails
- Fetch public pages supplied by the user.
- Respect site policies, rate limits, and robots guidance where applicable.
- Use output for content operations, QA, and RAG workflows.
- Do not use provider emblems or wording that implies approval by an upstream data provider.
See also
Related report Actors
Use these follow-on Actors when you want a capped, decision-ready report instead of more raw rows. They use public or user-provided inputs, respect maxChargeUsd, and do not promise rankings, revenue, conversion lifts, or sales outcomes.
- Website RAG Readiness Audit - turn public pages into a RAG-readiness score and cleanup actions.
- SaaS Pricing Page Monitor - turn public pricing pages into competitor pricing decision reports.
- Ad Landing Page Offer Intelligence - turn public landing pages into CRO offer and proof checklists.
Related paid report workflows
If this Actor gave you raw rows or source context, these follow-on report Actors are designed for a small capped paid run. They help make a decision, not just collect more data.
- Website RAG Readiness Audit Report - decide whether public website pages are clean and complete enough for RAG ingestion. Entry $9 /
website_rag_snapshot_report; premium $29 /website_rag_readiness_report. - SaaS Pricing Page Monitor & Competitor Price Change Alerts - decide whether a public competitor pricing page changed in a way that affects packaging or sales messaging. Entry $3 /
pricing_snapshot_report; premium $15 /competitor_pricing_report. - Ad Landing Page Offer Intelligence & CRO Gap Report - decide which public landing-page offer gaps to fix before increasing ad spend. Entry $3 /
landing_offer_report; premium $15 /cro_gap_report_pack.
Keep maxChargeUsd equal to the selected tier. Internal links are traffic aids only; real proof requires accounted paid usage.