Pricing

Pay per usage

Go to Apify Store

Crawl4ai

Try for free

Extract page content (markdown/HTML/text), metadata, and link stats. Uses crawl4ai.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Kael Odin

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

Categories

Other

Start URLs

startUrls

Required

Starting URLs to visit.

Type:string[]

Max Pages

maxPages

Optional

Maximum pages to process in total.

Type:integer

Minimum:1

Maximum:10000

Default:50

Max Depth

maxDepth

Optional

Maximum link depth from each start URL.

Type:integer

Minimum:0

Maximum:10

Default:2

Concurrency

concurrency

Optional

Number of concurrent tasks.

Type:integer

Minimum:1

Maximum:50

Default:5

Request Timeout (seconds)

requestTimeoutSecs

Optional

Timeout per page request.

Type:integer

Minimum:5

Maximum:600

Default:60

Headless

headless

Optional

Run browser headless.

Type:boolean

Default:true

Use Proxy

useProxy

Optional

Enable Apify proxy for requests.

Type:boolean

Default:false

Proxy Groups

proxyGroups

Optional

Apify proxy groups to use.

Type:string[]

Extract Mode

extractMode

Optional

Output format.

Type:string

Default:markdown

Options:

markdownhtmltext

Max Results

maxResults

Optional

Maximum output items to push.

Type:integer

Minimum:1

Maximum:200000

Default:1000

Same Domain Only

sameDomainOnly

Optional

Only follow links within start URL domains.

Type:boolean

Default:true

Include URL Patterns

includePatterns

Optional

Only include URLs matching these regex patterns (optional).

Type:string[]

Exclude URL Patterns

excludePatterns

Optional

Exclude URLs matching these regex patterns.

Type:string[]

Max Retries

maxRetries

Optional

Retry failed pages up to this count.

Type:integer

Minimum:0

Maximum:10

Default:2

Retry Backoff (seconds)

retryBackoffSecs

Optional

Base backoff in seconds, doubled each retry.

Type:integer

Minimum:0

Maximum:120

Default:2

Max Requests Per Minute

maxRequestsPerMinute

Optional

Global rate limit. Set 0 for unlimited.

Type:integer

Minimum:0

Maximum:6000

Default:0

Enable Stealth

enableStealth

Optional

Enable stealth mode for tougher sites.

Type:boolean

Default:false

User Agent

userAgent

Optional

Custom user agent string (optional).

Type:string

Clean Content

cleanContent

Optional

Remove navigation-heavy lines and normalize whitespace.

Type:boolean

Default:true

Include Raw Content

includeRawContent

Optional

Include unmodified content output in a separate field.

Type:boolean

Default:false

Max Content Characters

maxContentChars

Optional

Truncate content to this length (0 = unlimited).

Type:integer

Minimum:0

Maximum:500000

Default:0

Content Excerpt Characters

contentExcerptChars

Optional

Length of the content excerpt for quick previews.

Type:integer

Minimum:0

Maximum:5000

Default:300

Word Count Threshold

wordCountThreshold

Optional

Ignore text blocks with fewer words (0 = off). Reduces noise from empty or stub pages.

Type:integer

Minimum:0

Maximum:1000

Default:0

Virtual Scroll Selector

virtualScrollSelector

Optional

CSS selector for infinite-scroll container (e.g. #feed). When set, the crawler scrolls to load more content before extraction.

Type:string

Virtual Scroll Count

virtualScrollCount

Optional

Max scroll steps when virtual scroll is enabled.

Type:integer

Minimum:1

Maximum:100

Default:10

Wait Until

waitUntil

Optional

Page load strategy: domcontentloaded (fast), load (full load), or networkidle (SPA/slow sites).

Type:string

Default:domcontentloaded

Options:

domcontentloadedloadnetworkidle

Page Load Wait (seconds)

pageLoadWaitSecs

Optional

Extra delay in seconds after load before capturing HTML. Use for slow/SPA sites.

Type:number

Minimum:0

Maximum:60

Default:0

Wait For Selector

waitForSelector

Optional

CSS selector to wait for before extraction (e.g. .article-body or #main). Use css: or js: prefix for advanced conditions.

Type:string

Wait For Timeout (seconds)

waitForTimeoutSecs

Optional

Max seconds to wait for Wait For Selector. Ignored if Wait For Selector is empty.

Type:integer

Minimum:1

Maximum:300

Default:30

CSS Selector (extract region only)

cssSelector

Optional

Extract only content inside this CSS selector (e.g. main, .content, #article).

Type:string

Crawl Mode

crawlMode

Optional

full = extract content; discover_only = only URLs and links (no content, fast).

Type:string

Default:full

Options:

fulldiscover_only

Include Link URLs

includeLinkUrls

Optional

Include links_internal and links_external arrays in each item (full mode only).

Type:boolean

Default:false

Crawl4ai To Markdown Pro2

juryless_rainbow/crawl4ai-to-markdown-pro2

A high-performance web-to-markdown crawler for AI agents, optimized for LLM data extraction using Crawl4AI. Features stealth browsing and high-fidelity content extraction.

aaron jungs

Website Content Crawler for AI — Clean Markdown, 4x Cheaper

joyouscam35875/website-content-crawler

Crawl any website and extract clean text/markdown for LLMs, RAG pipelines, vector databases. BFS crawl with depth control, robots.txt support, boilerplate removal. Perfect for feeding AI models. $0.001/page — 4x cheaper than the official Apify crawler.

Ken Digital

Website Content Extractor

glowing_glove/website-content-extractor

Crawl public pages and extract page titles, meta descriptions, headings, readable text, source URLs, and crawl metadata.

Ushba Khan

Website Content Extractor for RAG: Markdown, HTML, Text

nezha/website-content-crawler

Turn docs sites, help centers, blogs, and websites into clean markdown, text, or HTML for RAG, AI knowledge bases, and internal search. Crawl from start URLs or sitemaps and keep the crawl in scope.

nezha

5.0

RAG Web Browser Scraper

datapilot/rag-web-browser-scraper

RAG Web Browser Search & Crawl Actor uses to search Bing or crawl URLs, then extracts page content as clean markdown. It captures title, description, language, HTTP status, and structured metadata. Supports multiple queries, proxies, and outputs organized crawl + search results.

Data Pilot

Website Content Pipeline for AI: Markdown, Tokens, RAG Chunks

scrapemint/website-content-crawler

Crawl any website and ship clean Markdown, plain text, and HTML for AI, LLM, and RAG pipelines. Each row carries token estimates, JSON LD metadata, link graph, and optional auto chunk splitting for vector databases. Pay per page.

Ken M

AI Web Content Crawler - Markdown for LLMs

intelscrape/ai-web-content-crawler

Crawl any website and extract clean Markdown optimized for LLM training, RAG pipelines, and AI knowledge bases - removes boilerplate and outputs structured JSON with URL, title, markdown, and metadata.

IntelScrape

Web Page to Markdown Extractor — URL to Markdown API

fetch_cat/web-page-to-markdown-extractor

Convert public URLs into clean Markdown, text, metadata, links, images, and optional HTML for AI agents, RAG, support, and automation workflows.

Hanna Nosova

Website URL Crawler & Link Extractor

maximedupre/website-url-crawler

Crawl JavaScript-rendered websites and export a URL link map. Get source pages, depth, anchor text, link type, HTTP metadata, and crawl status.

Maxime Dupré

Ai Ready Web Page To Markdown Converter

mustafa.irshaid.113/ai-ready-web-page-to-markdown-converter

Convert any webpage into structured Markdown and HTML using just a URL. Get the page title, link, and content—perfect for SEO, devs, and AI crawlers. Fast, clean, and ideal for repurposing or analysis. Start turning websites into Markdown instantly.