Pricing

Pay per usage

Go to Apify Store

Crawl4ai

Try for free

Extract page content (markdown/HTML/text), metadata, and link stats. Uses crawl4ai.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Kael Odin

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Website Content Extractor

Apify Actor: extract page content (markdown/HTML/text), metadata, and link stats. Uses crawl4ai.

Quick start

pip install -e ".[dev]"
crawl4ai-setup
python -m crawl4ai_actor.main

Input: startUrls (required), maxPages, maxDepth, waitUntil, waitForSelector, cssSelector, etc. Full schema: .actor/input_schema.json.

Output: dataset with url, success, content, title, content_length, links_internal_count, etc. Run summary in Storage → Key-value store (runSummary), including failedUrls for retries.

Options (high level)

Option	Purpose
`crawlMode`	`full` (default) \| `discover_only` — discover_only = URLs + links only, no content
`includeLinkUrls`	Include `links_internal` / `links_external` arrays in each item
`waitUntil`	`domcontentloaded` \| `load` \| `networkidle` (SPA/slow sites)
`pageLoadWaitSecs`	Extra delay before capture
`waitForSelector`	Wait for CSS selector (or `css:`/`js:` prefix)
`cssSelector`	Extract only this region (e.g. `main`, `.article`)
`virtualScrollSelector`	Infinite-scroll container to expand

Example — SPA / slow site: { "startUrls": ["https://..."], "waitUntil": "networkidle", "pageLoadWaitSecs": 2 }
Example — discover links only: { "startUrls": ["https://..."], "crawlMode": "discover_only", "maxPages": 100 }

Run locally / Docker

$docker build -t website-content-extractor .

Regression

$UX_MATRIX_GROUP=core python scripts/ux_matrix.py

Reports: scripts/ux_matrix_output.json, scripts/ux_matrix_report.txt (gitignored).

Crawl4ai To Markdown Pro2

juryless_rainbow/crawl4ai-to-markdown-pro2

A high-performance web-to-markdown crawler for AI agents, optimized for LLM data extraction using Crawl4AI. Features stealth browsing and high-fidelity content extraction.

aaron jungs

Website Content Crawler for AI — Clean Markdown, 4x Cheaper

joyouscam35875/website-content-crawler

Crawl any website and extract clean text/markdown for LLMs, RAG pipelines, vector databases. BFS crawl with depth control, robots.txt support, boilerplate removal. Perfect for feeding AI models. $0.001/page — 4x cheaper than the official Apify crawler.

Ken Digital

Website Content Extractor for RAG: Markdown, HTML, Text

nezha/website-content-crawler

Turn docs sites, help centers, blogs, and websites into clean markdown, text, or HTML for RAG, AI knowledge bases, and internal search. Crawl from start URLs or sitemaps and keep the crawl in scope.

nezha

5.0

RAG Web Browser Scraper

datapilot/rag-web-browser-scraper

RAG Web Browser Search & Crawl Actor uses to search Bing or crawl URLs, then extracts page content as clean markdown. It captures title, description, language, HTTP status, and structured metadata. Supports multiple queries, proxies, and outputs organized crawl + search results.

Data Pilot

Ai Ready Web Page To Markdown Converter

mustafa.irshaid.113/ai-ready-web-page-to-markdown-converter

Convert any webpage into structured Markdown and HTML using just a URL. Get the page title, link, and content—perfect for SEO, devs, and AI crawlers. Fast, clean, and ideal for repurposing or analysis. Start turning websites into Markdown instantly.

Mustafa Irshaid

🕷️ Website Crawler — Full-Site Scraping for AI

nexgendata/website-content-crawler

Crawl entire websites for clean text, markdown or HTML. Perfect for RAG pipelines, AI training & content analysis. Handles JS-rendered pages. Alternative to Firecrawl & Jina. Pay per page.

Stephan Corbeil

Html to Markdown Converter

antonio_espresso/html-to-markdown-converter

Crawl a target URL and convert its HTML content into clean, structured Markdown with optional heading-based chunking.

Antonio Blago

Website Content Pipeline for AI: Markdown, Tokens, RAG Chunks

scrapemint/website-content-crawler

Crawl any website and ship clean Markdown, plain text, and HTML for AI, LLM, and RAG pipelines. Each row carries token estimates, JSON LD metadata, link graph, and optional auto chunk splitting for vector databases. Pay per page.

Ken M

AI Website Content Markdown Scraper

quaking_pail/ai-website-content-markdown-scraper

This Apify Actor, "Website Content Crawler with Markdown Extraction," is designed to perform a comprehensive crawl of specified websites, extract their text content, convert it into Markdown format, and store it in a structured dataset. The extracted content is suitable for feeding LLMs.

AI_Builder

932

2.3

Smart Web Content Extractor for AI & LLM

project_bbb/smart-web-content-extractor

Crawl any website and extract clean, structured content optimized for LLM consumption. Outputs Markdown, plain text, or HTML with metadata. Removes nav, ads, and boilerplate automatically.