Website Content Extractor
Pricing
Pay per event
Website Content Extractor
Extract clean text and markdown from docs, pricing, product, policy, and help-center URLs for RAG datasets and content operations.
Website Content Extractor
Pricing
Pay per event
Extract clean text and markdown from docs, pricing, product, policy, and help-center URLs for RAG datasets and content operations.
Public broad website pages to clean (max 200). Best for docs, product, pricing, policy, and knowledge-base pages; route article/news/blog URLs to article-content-extractor.
Choose markdown for the strongest first-run proof and easiest downstream reuse.
Include page metadata such as description, author, language, and published date when available.
Number of pages to fetch in parallel.
HTTP timeout per page in milliseconds.
Select dataset-only output or webhook handoff. Non-dry-run always writes canonical dataset rows first; webhook delivery runs only after dataset/PPE output succeeds.
Webhook destination used when delivery=webhook. The webhook is sent after canonical dataset rows and PPE output succeed, and is skipped on dryRun.