Pricing

from $0.01 / 1,000 results

Website Content Crawler

Crawl websites for SEO audits. Extracts HTML, title, meta tags, headings, links, & text content from pages. Automatic sitemap detection & parsing Extracts metadata (title, description, OG tags) Heading structure (H1, H2, H3) Internal & external link analysis Image extraction w/alt text Word count

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

The Howlers

Actor stats

Bookmarked

Total users

Monthly active users

6 hours ago

Last modified

Website Crawler - SEO Audit Crawler with LLM-Ready Markdown Extraction

Fast, reliable website crawler built for SEO audits and AI/LLM content analysis. Auto-discovers sitemaps, extracts metadata, headings, links, images, and LLM-ready markdown content from every page. Uses Mozilla Readability and Turndown for Firecrawl-like markdown extraction without external API costs.

Features

LLM-Ready Markdown - Firecrawl-like extraction using Mozilla Readability + Turndown
Sitemap Discovery - Automatic detection of sitemap.xml, sitemap_index.xml, wp-sitemap.xml
Full HTML Extraction - Complete page HTML for custom parsing
Metadata Extraction - Title, description, OG tags, canonical, robots directives
Heading Structure - H1, H2, H3 hierarchy analysis
Link Analysis - Internal and external link mapping
Image Extraction - All images with alt text and src URLs
Word Count - Text and markdown word counts per page
Load Time Metrics - Page load time in milliseconds
Status Code Tracking - HTTP status codes for broken link detection
Bot Evasion - Fingerprint injection for reliable crawling
Configurable Depth - Set crawl depth and page limits
Demo Mode - Test with sample data before going live

Who Should Use This Actor?

SEO Agencies

Run technical SEO audits at scale. Extract metadata, heading structure, link architecture, and content from every page of a client's website in one crawl.

Content Teams

Extract clean markdown from any website for AI/LLM processing, content analysis, or migration projects. No Firecrawl API key required.

AI/ML Engineers

Build training datasets from websites with clean markdown extraction. Each page outputs structured data ready for LLM fine-tuning or RAG pipelines.

Web Developers

Audit site structure before migrations. Map internal links, find broken pages, and inventory all URLs with metadata.

Digital Marketing Agencies

Create comprehensive site audits for client onboarding. Analyze meta tags, heading hierarchy, and content structure across entire websites.

Competitive Intelligence Teams

Crawl competitor websites to analyze their content structure, internal linking strategy, and page architecture.

Quick Start

Demo Mode (Free Test)

{
  "demoMode": true
}

Basic Website Crawl

{
  "startUrls": [{ "url": "https://example.com" }],
  "maxCrawlPages": 25,
  "maxCrawlDepth": 2,
  "crawlSitemap": true,
  "demoMode": false
}

Deep SEO Audit Crawl

{
  "startUrls": [{ "url": "https://example.com" }],
  "maxCrawlPages": 500,
  "maxCrawlDepth": 5,
  "crawlSitemap": true,
  "demoMode": false
}

Multi-Site Crawl

{
  "startUrls": [
    { "url": "https://site1.com" },
    { "url": "https://site2.com" },
    { "url": "https://site3.com" }
  ],
  "maxCrawlPages": 100,
  "maxCrawlDepth": 3,
  "crawlSitemap": true,
  "demoMode": false
}

Input Parameters

Parameter	Type	Default	Description
`startUrls`	array	-	URLs to start crawling from (required unless demoMode)
`maxCrawlPages`	number	`25`	Maximum pages to crawl per site
`maxCrawlDepth`	number	`2`	Maximum link depth to follow
`crawlSitemap`	boolean	`true`	Auto-discover and parse sitemaps
`proxyConfiguration`	object	Residential	Proxy settings
`demoMode`	boolean	`true`	Return sample data for testing
`webhookUrl`	string	-	Webhook URL for results delivery

Output Format

{
  "url": "https://example.com/page",
  "title": "Page Title",
  "html": "<html>...</html>",
  "text": "Page text content...",
  "markdown": "# Page Title\n\nClean markdown content ready for LLMs...",
  "statusCode": 200,
  "loadTimeMs": 1234,
  "metadata": {
    "description": "Meta description",
    "keywords": "seo, crawler",
    "ogTitle": "Open Graph Title",
    "ogDescription": "OG description",
    "ogImage": "https://example.com/og.jpg",
    "canonical": "https://example.com/page",
    "robots": "index, follow"
  },
  "headings": {
    "h1": ["Main Heading"],
    "h2": ["Subheading 1", "Subheading 2"],
    "h3": []
  },
  "links": {
    "internal": ["https://example.com/other"],
    "external": ["https://external.com"]
  },
  "images": [
    { "src": "https://example.com/image.jpg", "alt": "Alt text" }
  ],
  "wordCount": 1500,
  "markdownWordCount": 1200,
  "crawledAt": "2026-01-28T12:00:00.000Z"
}

Pricing (Pay-Per-Event)

Event	Description	Price
`page_crawled`	Per page crawled with full extraction	$0.05
`sitemap_discovered`	Per sitemap discovered and parsed	$0.05

Example costs:

25 pages (no sitemap): 25 x $0.05 = $1.25
100 pages + sitemap: (100 x $0.05) + $0.05 = $5.05
500 pages + sitemap: (500 x $0.05) + $0.05 = $25.05
Demo mode: $0.00

Common Scenarios

Scenario 1: Technical SEO Audit

{
  "startUrls": [{ "url": "https://client-website.com" }],
  "maxCrawlPages": 500,
  "maxCrawlDepth": 5,
  "crawlSitemap": true,
  "demoMode": false
}

Crawl an entire client website to audit meta tags, headings, links, and content structure.

Scenario 2: LLM Content Extraction

{
  "startUrls": [{ "url": "https://documentation-site.com" }],
  "maxCrawlPages": 200,
  "maxCrawlDepth": 3,
  "crawlSitemap": true,
  "demoMode": false
}

Extract clean markdown from documentation sites for RAG pipelines or AI training data.

Scenario 3: Pre-Migration URL Inventory

{
  "startUrls": [{ "url": "https://old-website.com" }],
  "maxCrawlPages": 1000,
  "maxCrawlDepth": 10,
  "crawlSitemap": true,
  "demoMode": false
}

Create a complete URL inventory with metadata before a site migration or redesign.

Webhook & Automation Integration

Zapier / Make.com / n8n

Create a webhook trigger in your automation platform
Copy the webhook URL to webhookUrl
Route results to Google Sheets, databases, or analysis tools

Popular automations:

Crawl data -> Google Sheets (SEO audit spreadsheet)
Broken pages (4xx/5xx) -> Slack alert (site health monitoring)
Markdown content -> Database (AI training data pipeline)
Page metadata -> Airtable (content inventory)

Apify Scheduled Runs

Schedule weekly or monthly crawls to track site changes and detect regressions.

FAQ

Q: How is this different from Firecrawl?

A: This crawler provides Firecrawl-like markdown extraction using Mozilla Readability + Turndown without requiring an external API key. You get clean, LLM-ready markdown at a fraction of the cost.

Q: Does it handle JavaScript-rendered pages?

A: Yes. The crawler uses a headless browser with fingerprint injection to render JavaScript-heavy pages before extracting content.

Q: Can I crawl password-protected pages?

A: Currently, only publicly accessible pages are supported. The crawler does not handle authentication.

Q: How does sitemap discovery work?

A: The crawler automatically checks for sitemap.xml, sitemap_index.xml, and wp-sitemap.xml at the domain root. Discovered URLs are added to the crawl queue.

Q: What happens with redirects?

A: Redirects are followed automatically. The final URL and status code are recorded.

Common Problems & Solutions

"Pages not loading"

Some sites require JavaScript rendering - this is handled automatically
Check if the site has aggressive bot protection
Try with residential proxy configuration

"Crawl stops early"

Check maxCrawlPages and maxCrawlDepth limits
Some sites have few internal links, limiting discovery
Enable crawlSitemap: true to discover more URLs

"Missing markdown content"

Pages with very little text content produce minimal markdown
Image-heavy pages may have low markdownWordCount
Check that the page has actual text content

"Demo data showing"

Set demoMode: false - no API keys required

📞 Support

Actor Arsenal: Full Actor Catalog
Developer: John Rippy

Built by John Rippy | Actor Arsenal

Website Analyzer Crawler

quarterly_lettuce/website-analyzer-crawler

A powerful web crawler that analyzes websites and extracts comprehensive SEO data including meta tags, headings structure, word count, internal/external links, and images.

Abhishek Kumar Giri

universal-website-content-scraper

techionik9993/universal-website-content-scraper

Powerful universal website scraper that extracts structured page titles, meta descriptions, H1–H3 headings, and clean main content. Smart content detection removes navigation and noise. Optional depth-controlled internal crawling. Ideal for SEO audits, AI preprocessing, research, and data pipelines.

Techionik

5.0

Website Title & Heading Quality Checker

gr_59017/website-title-heading-quality-checker

Analyzes website title tags and heading structure (H1–H6) to evaluate SEO quality, content hierarchy, and best practices. Detects issues like missing or multiple H1s, improper heading order, and suboptimal title length, and returns a quality score with suggestions.

Gautam Rana

5.0

Website text scraper

spark_actors/website-text-scraper

Extracts key content from any website URL you provide. It fetches the page’s title, meta description, all headings (H1 to H6), paragraphs, links, and tables — delivering structured data for easy use. Ideal for quick insights, SEO analysis, or data extraction without complex setup.

muhammad ubaid

Website Metadata Extractor (meta tags, sitemap, robots) 🔎

powerful_bachelor/website-metadata-extractor

🔍 Website Metadata Extractor 🌐 Extract essential website data: meta tags, robots.txt, and sitemap.xml in one scan. 📊 Analyze SEO elements, crawler directives, and site structure. ✅ Perfect for SEO audits, 🔎 competitor research, and 🚀 understanding how search engines view your website.

Powerful Bachelor

DataForSeo On-page SEO

alizarin_refrigerator-owner/dataforseo-onpage

This actor performs in-depth on-page SEO audits using the DataForSEO On-Page API. Page titles, meta descriptions, headings, content, technical SEO factors w/actionable recommendations. Technical SEO, Content Analysis, Meta Tags, Heading Structure, Image Optimization, Link Analysis & Core Web Vitals

The Howlers

No-BS Content Crawler 🖕

successful_nonagon/no-bs-content-crawler

Fast web crawler that extracts clean text from websites. Returns readable content, headings, and links. Perfect for content aggregation, SEO research, and data collection.

hafsah nuzhat

5.0

SEO Audit Tool

cerebral_aluminum/seo-audit

Audit any website SEO. Checks title, meta, headings, images, OG tags, canonical, schema. Returns score and grade.

Benny

Open Graph Tag Checker

scrappy_garden/open-graph-tag-checker

Check and validate Open Graph (og:*) meta tags to improve link previews on Facebook, LinkedIn, Slack, and more. Detect missing/empty OG tags (og:title, og:description, og:image, og:url, og:type), duplicates, invalid URLs, and optional image URL accessibility checks.

Bikram Adhikari

SEO Analyzer

vivid_astronaut/seo-analyzer

Analyze websites for SEO issues. Get scores, meta tags, headings, links, images, and actionable recommendations.

Fabio Suizu

Website Content Crawler

Website Crawler - SEO Audit Crawler with LLM-Ready Markdown Extraction

Features

Who Should Use This Actor?

SEO Agencies

Content Teams

AI/ML Engineers

Web Developers

Digital Marketing Agencies

Competitive Intelligence Teams

Quick Start

Demo Mode (Free Test)

Basic Website Crawl

Deep SEO Audit Crawl

Multi-Site Crawl

Input Parameters

Output Format

Pricing (Pay-Per-Event)

Common Scenarios

Scenario 1: Technical SEO Audit

Scenario 2: LLM Content Extraction

Scenario 3: Pre-Migration URL Inventory

Webhook & Automation Integration

Zapier / Make.com / n8n

Apify Scheduled Runs

FAQ

Q: How is this different from Firecrawl?

Q: Does it handle JavaScript-rendered pages?

Q: Can I crawl password-protected pages?

Q: How does sitemap discovery work?

Q: What happens with redirects?

Common Problems & Solutions

"Pages not loading"

"Crawl stops early"

"Missing markdown content"

"Demo data showing"

📞 Support

You might also like

Website Analyzer Crawler

universal-website-content-scraper

Website Title & Heading Quality Checker

Website text scraper

Website Metadata Extractor (meta tags, sitemap, robots) 🔎

DataForSeo On-page SEO

No-BS Content Crawler 🖕

SEO Audit Tool

Open Graph Tag Checker

SEO Analyzer