Pricing

Pay per event

Website Content Extractor

Extract clean text and markdown from docs, pricing, product, policy, and help-center URLs for RAG datasets and content operations.

Pricing

Pay per event

Rating

0.0

(0)

Developer

太郎山田

Actor stats

Bookmarked

Total users

Monthly active users

2 days ago

Last modified

After this run

Turn this Actor's output into a capped paid report with SaaS Pricing Page Monitor & Competitor Price Change Alerts. Use it when SaaS founders, product marketers, and pricing teams need to decide whether a public competitor pricing page changed in a way that affects packaging or sales messaging.

First report: $3 / pricing_snapshot_report; set maxChargeUsd to $3.
Deeper report: $15 / competitor_pricing_report; use only when the first result needs competitor or action-depth.
This is an internal Apify flow aid. It is not revenue proof until accounted paid usage appears.

Next report-style Actors

If you already have data from this Actor, these follow-on Actors turn public or user-provided inputs into decision-ready reports. They are optional, capped by maxChargeUsd, and do not make business outcome claims.

Website RAG Readiness Audit Report - decide whether extracted public pages are clean enough for RAG before embedding.
SaaS Pricing Page Monitor - turn public pricing pages into competitor pricing action reports.
Ad Landing Page Offer Intelligence - turn landing pages into CRO offer and proof checklists.

AI builders, content ops, SEO teams, and documentation teams use this actor to turn Public website pages supplied by the user into a clean dataset for Site QA & Content Intelligence Pack. Provide focused source inputs, keep the first run small, and expand only after the output shape is useful. Each emitted row includes source context, timestamps, and fields designed for monitoring, QA, research, or workflow handoff.

Store Quickstart

Start with 5 to 20 URLs from one domain, review extracted markdown quality, then expand to sitemap or scheduled checks.

Recommended first run:

{
  "urls": [
    "https://example.com/docs"
  ],
  "outputFormat": "markdown",
  "limit": 10,
  "delivery": "dataset",
  "dryRun": false
}

Input examples

Docs pages

{
  "urls": [
    "https://example.com/docs"
  ],
  "outputFormat": "markdown",
  "limit": 10,
  "delivery": "dataset",
  "dryRun": false
}

Pricing and product pages

{
  "urls": [
    "https://example.com/pricing",
    "https://example.com/product"
  ],
  "outputFormat": "text",
  "limit": 20,
  "delivery": "dataset",
  "dryRun": false
}

Webhook handoff

{
  "urls": [
    "https://example.com/help"
  ],
  "outputFormat": "markdown",
  "delivery": "webhook",
  "webhookUrl": "https://example.com/webhook",
  "dryRun": false
}

Sample output

{
  "meta": {
    "actorName": "website-content-extractor",
    "actorTitle": "Website Content Extractor",
    "bundle": "Site QA & Content Intelligence Pack",
    "fetchedAt": "2026-05-06T00:00:00.000Z",
    "totalRows": 1
  },
  "rows": [
    {
      "actorName": "website-content-extractor",
      "rowType": "web_content",
      "url": "https://example.com/docs",
      "title": "Example Docs",
      "markdown": "# Example Docs\nUseful content.",
      "wordCount": 240,
      "sourceUrl": "https://example.com/docs",
      "fetchedAt": "2026-05-06T00:00:00.000Z"
    }
  ],
  "warnings": []
}

Output fields

rowType
url
title
markdown
text
wordCount
metadata
sourceUrl
fetchedAt

Rows also include source URLs, fetch timestamps, warnings when a source is partial, and stable IDs when the workflow supports recurring change detection.

Pricing and no-change runs

$0.001 actor start and $0.009 per useful content row. Failed/no-content rows should stay out of the default dataset.

The default dataset is the billable surface. Dry runs, validation-only runs, missing-key warnings, and unchanged recurring polls should not write payable default-dataset rows.

Compliance guardrails

Fetch public pages supplied by the user.
Respect site policies, rate limits, and robots guidance where applicable.
Use output for content operations, QA, and RAG workflows.
Do not use provider emblems or wording that implies approval by an upstream data provider.

Website RAG Readiness Audit Report - decide whether public website pages are clean and complete enough for RAG ingestion. Entry $9 / website_rag_snapshot_report; premium $29 / website_rag_readiness_report.
SaaS Pricing Page Monitor & Competitor Price Change Alerts - decide whether a public competitor pricing page changed in a way that affects packaging or sales messaging. Entry $3 / pricing_snapshot_report; premium $15 / competitor_pricing_report.
Ad Landing Page Offer Intelligence & CRO Gap Report - decide which public landing-page offer gaps to fix before increasing ad spend. Entry $3 / landing_offer_report; premium $15 / cro_gap_report_pack.

Keep maxChargeUsd equal to the selected tier. Internal links are traffic aids only; real proof requires accounted paid usage.

Website Content Extractor for RAG: Markdown, HTML, Text

nezha/website-content-crawler

Turn docs sites, help centers, blogs, and websites into clean markdown, text, or HTML for RAG, AI knowledge bases, and internal search. Crawl from start URLs or sitemaps and keep the crawl in scope.

nezha

5.0

Docs Markdown Rag Ready Crawler

devwithbobby/docs-markdown-rag-ready-crawler

Turn any documentation site or website into clean, structured markdown—ready for RAG, embeddings, and AI agents.

Dev with Bobby

Website to Markdown — AI-Ready Content for RAG

ryanclinton/website-to-markdown

Website To Markdown. Available on the Apify Store with pay-per-event pricing.

Ryan Clinton

AI Website Content Extractor

scrapeai/ai-website-content-extractor

Crawl website pages, strip noise, and convert the main content to clean Markdown for RAG/LLM training.

ScrapeAI

5.0

Website Content Crawler API - Markdown for RAG

tugelbay/website-content-crawler

Crawl public websites and extract clean Markdown, text, or HTML for RAG pipelines, AI agents, documentation indexing, and content monitoring. Guide: https://konabayev.com/tools/website-content-crawler/?utm_source=apify_info&utm_medium=referral&utm_campaign=website-content-crawler

Tugelbay Konabayev

Website Content Crawler

crawlerbros/website-content-crawler

Crawls websites and extracts clean text, markdown, or HTML content. Ideal for LLM training data, RAG pipelines, and knowledge base building.

Crawler Bros

5.0

Website to Markdown and RAG Dataset Crawler

orbiscribe/website-rag-dataset-builder

Crawl public websites into clean Markdown, text, metadata, links, JSON-LD, and chunks for RAG and knowledge bases.

Orbiscribe Labs

Website Content Crawler

parseforge/website-content-crawler

Crawl any website and pull clean Markdown content ready for AI! Follow links across a whole domain and extract page text, titles, headings, images, and metadata. Perfect for building RAG pipelines, training datasets, knowledge bases, and vector databases. Start crawling content in minutes!

ParseForge

PDF URL to Markdown, Tables & RAG Extractor

thescrapelab/Apify-PDF-url-scraper

Extract clean Markdown, page text, tables, metadata, summaries, and AI-ready RAG chunks from PDF URLs.

Inus Grobler

rag-docs-scraper

marbled_jury/my-actor

Extract clean, RAG-optimized Markdown from any technical documentation. Built for LLMs and AI agents. No noise, just high-fidelity data.

Hastin S.

Website Content Extractor

After this run

Next report-style Actors

Store Quickstart

Input examples

Docs pages

Pricing and product pages

Webhook handoff

Sample output

Output fields

See also (Content extraction cluster)

Pricing and no-change runs

Compliance guardrails

See also

Website Content Extractor for RAG: Markdown, HTML, Text

Docs Markdown Rag Ready Crawler

Website to Markdown — AI-Ready Content for RAG

AI Website Content Extractor

Website Content Crawler API - Markdown for RAG

Website Content Crawler

Website to Markdown and RAG Dataset Crawler

Website Content Crawler

PDF URL to Markdown, Tables & RAG Extractor

rag-docs-scraper

Website Content Extractor

After this run

Next report-style Actors

Store Quickstart

Input examples

Docs pages

Pricing and product pages

Webhook handoff

Sample output

Output fields

See also (Content extraction cluster)

Pricing and no-change runs

Compliance guardrails

See also

Related report Actors

Related paid report workflows

You might also like

Website Content Extractor for RAG: Markdown, HTML, Text

Docs Markdown Rag Ready Crawler

Website to Markdown — AI-Ready Content for RAG

AI Website Content Extractor

Website Content Crawler API - Markdown for RAG

Website Content Crawler

Website to Markdown and RAG Dataset Crawler

Website Content Crawler

PDF URL to Markdown, Tables & RAG Extractor

rag-docs-scraper