Deprecated

Pricing

from $0.50 / 1,000 results

See alternative Actors

Go to Apify Store

Website Content Crawler

Deprecated

See alternative Actors

Full website crawling

Pricing

from $0.50 / 1,000 results

Rating

0.0

(0)

Developer

Virtual Footprint LLC

Actor stats

Bookmarked

Total users

Monthly active users

20 days ago

Last modified

Features

Collect structured content and metadata fields
Support direct URL and query-driven extraction modes
Return normalized records suitable for RAG and analytics
Batch-friendly processing for multiple sources
Designed for automated content pipelines
Output is optimized for Website Content Crawler buyer workflows on Apify

Common Use Cases

Content intelligence
Article monitoring
Knowledge base ingestion
Research workflows
Data enrichment
Internal reporting

Example Input

{
  "query": "market research",
  "queries": [
    "market research"
  ],
  "urls": [
    "https://www.example.org"
  ],
  "maxResults": 25,
  "includeRaw": false,
  "maxCostPerRun": 5
}

Example Output

{
  "query": "market research",
  "url": "https://www.example.org",
  "actorSlug": "record value",
  "source": "record value",
  "title": "record value",
  "description": "record value",
  "scrapedAt": "record value"
}

Input Parameters

Field	Type	Required	Description
query	string	No	Primary keyword, URL, profile, company, product, or identifier to collect
queries	array	No	Optional batch list of query strings. Used when query is empty or when batching is…
urls	array	No	Optional direct URLs to process. These take priority over discovery when provided
maxResults	integer	No	Maximum number of dataset items to emit
includeRaw	boolean	No	Include collection diagnostics and raw source metadata where available
maxCostPerRun	number	No	Optional guardrail in USD. The actor caps output before exceeding this amount
proxyConfiguration	object	No	Apify proxy settings for production runs

Output Fields

Field	Type	Description
query	string	Normalized query value
url	string	Normalized url value
actorSlug	string	Normalized actorSlug value
source	string	Normalized source value
title	string	Normalized title value
description	string	Normalized description value
scrapedAt	string	Normalized scrapedAt value
runId	string	Normalized runId value
rank	string	Normalized rank value
content	string	Normalized content value
summary	string	Normalized summary value
author	string	Normalized author value

Export Formats

JSON
CSV
Excel
XML
RSS

Pricing

Pricing Model: PAY_PER_EVENT

$3.00 per 1,000 dataset items.

FAQ

Does this actor support batch processing?

Yes.

Can I export results to CSV?

Yes.

Can I schedule runs?

Yes, through Apify schedules.

Can I run this actor via API?

Yes, via the Apify API.

Does it support direct URLs?

Yes.

Can I integrate this actor with n8n or Make?

Yes.

Website Content Crawler

apify/website-content-crawler

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Apify

143K

4.5

(213)

Fast Website Content Crawler

6sigmag/fast-website-content-crawler

A high-performance web scraper that rapidly extracts and analyzes content from multiple websites simultaneously. Perfect for competitive research, content aggregation, and website structure analysis.

David

4.3K

4.9

(7)

🔥 FireScrape AI Website Content Markdown Scraper

mohamedgb00714/fireScraper-AI-Website-Content-Markdown-Scraper

Advanced web scraper powered by Crawlee and Puppeteer — extracts website content, converts it to Markdown, and structures it for LLM training datasets.

mohamed el hadi msaid

303

1.9

(2)

Website Content Crawler

parseforge/website-content-crawler

Crawl any website and pull clean Markdown content ready for AI! Follow links across a whole domain and extract page text, titles, headings, images, and metadata. Perfect for building RAG pipelines, training datasets, knowledge bases, and vector databases. Start crawling content in minutes!

ParseForge

Deep Website Content Crawler

6sigmag/deep-website-content-crawler

Scrape Failed Killer! A high-performance web scraper that rapidly extracts and analyzes content from multiple websites simultaneously. Perfect for competitive research, content aggregation, and website structure analysis.

David

1.1K

3.0

(1)

Website Content Crawler

mikolabs/website-content-crawler

Deep-crawl websites to extract clean text, Markdown, or HTML for AI/LLM apps, RAG pipelines, and vector databases. Supports adaptive crawling, HTML cleaning, file downloads, and structured dataset output. Easily integrates with LangChain, LlamaIndex, and other LLM tools.

mikolabs

5.0

(1)

🧪High-Volume Website Content & Media Scraper

caring_dizi/blog-content-scraper-fixed

🧪Crawling Done Right! Let me now what you think, what or where or how i can improve my actor, and i am all for constructive criticism. So please message if you have any questions. Enjoy and have a good day.

Jeff Halverson

152

5.0

(2)

AI Website Content Markdown Scraper

quaking_pail/ai-website-content-markdown-scraper

This Apify Actor, "Website Content Crawler with Markdown Extraction," is designed to perform a comprehensive crawl of specified websites, extract their text content, convert it into Markdown format, and store it in a structured dataset. The extracted content is suitable for feeding LLMs.

AI_Builder

939

2.3

(3)

Website Content Crawler

alizarin_refrigerator-owner/website-crawler

Crawl websites for SEO audits. Extracts HTML, title, meta tags, headings, links, & text content from pages. Automatic sitemap detection & parsing Extracts metadata (title, description, OG tags) Heading structure (H1, H2, H3) Internal & external link analysis Image extraction w/alt text Word count

The Howlers

121

Website Content Text Extractor

smart-digital/website-content-text-extractor

Extract visible text content from websites as structured JSON blocks. Supports multi-URL batch processing, header/footer/cookie exclusion, and optional form extraction. Perfect for content analysis and translation workflows.

My Smart Digital

5.0

(1)

Website Content Crawler

jasondev/website-content-crawler

A powerful web crawler that extracts text content from websites, optimized for AI models, Large Language Models (LLMs), vector databases, and Retrieval-Augmented Generation (RAG) pipelines.