Website Content Crawler avatar
Website Content Crawler

Pricing

from $0.01 / 1,000 results

Go to Apify Store
Website Content Crawler

Website Content Crawler

Crawl websites for SEO audits. Extracts HTML, title, meta tags, headings, links, & text content from pages. Automatic sitemap detection & parsing Extracts metadata (title, description, OG tags) Heading structure (H1, H2, H3) Internal & external link analysis Image extraction w/alt text Word count

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

John Rippy

John Rippy

Maintained by Community

Actor stats

0

Bookmarked

15

Total users

9

Monthly active users

4 days ago

Last modified

Share

"SEO Audit & LLM-Ready Content Extraction" by John Rippy | johnrippy.link

🏆 2025 Zapier Automation Hero of the YearProject Phoenix: A 95-step AI sales pipeline cutting development time by 50%. Read more →


Fast, reliable website crawler built for SEO audits and AI/LLM content analysis. Auto-discovers sitemaps, extracts metadata, headings, and LLM-ready markdown content from every page. A powerful Firecrawl alternative.

Features

  • LLM-Ready Markdown Extraction - Firecrawl-like functionality using Mozilla Readability + Turndown
  • Automatic sitemap detection (sitemap.xml, sitemap_index.xml, wp-sitemap.xml)
  • Full HTML extraction per page
  • Extracts metadata (title, description, OG tags, canonical, robots)
  • Heading structure (H1, H2, H3)
  • Internal and external link analysis
  • Image extraction with alt text
  • Word count and load time metrics
  • Fingerprint injection for bot evasion
  • Configurable crawl depth and page limits

Input

{
"startUrls": [{ "url": "https://example.com" }],
"maxCrawlPages": 25,
"maxCrawlDepth": 2,
"crawlSitemap": true,
"demoMode": false
}
FieldTypeDefaultDescription
startUrlsarrayrequiredURLs to start crawling from
maxCrawlPagesnumber25Maximum pages to crawl
maxCrawlDepthnumber2Maximum link depth to follow
crawlSitemapbooleantrueAuto-discover and parse sitemaps
demoModebooleantrueReturn mock data (for testing)

Output

Each page returns:

{
"url": "https://example.com/page",
"title": "Page Title",
"html": "<html>...</html>",
"text": "Page text content...",
"markdown": "# Page Title\n\nClean markdown content ready for LLMs...",
"statusCode": 200,
"loadTimeMs": 1234,
"metadata": {
"description": "Meta description",
"keywords": "seo, crawler",
"ogTitle": "Open Graph Title",
"ogDescription": "OG description",
"ogImage": "https://example.com/og.jpg",
"canonical": "https://example.com/page",
"robots": "index, follow"
},
"headings": {
"h1": ["Main Heading"],
"h2": ["Subheading 1", "Subheading 2"],
"h3": []
},
"links": {
"internal": ["https://example.com/other"],
"external": ["https://external.com"]
},
"images": [
{ "src": "https://example.com/image.jpg", "alt": "Alt text" }
],
"wordCount": 1500,
"markdownWordCount": 1200,
"crawledAt": "2025-01-09T12:00:00.000Z"
}

Use Cases

  • SEO Audits - Technical SEO analysis and site health checks
  • LLM Content Preparation - Extract clean markdown for AI training data
  • Content Analysis - Analyze page content structure and quality
  • Site Structure Mapping - Map internal linking and site architecture
  • Broken Link Detection - Find 404s and dead links
  • Meta Tag Analysis - Audit title tags, descriptions, and OG tags

Pricing

This actor uses pay-per-event pricing:

EventTitleDescriptionPrice
page_crawledPage CrawledPer page crawled with full HTML, text, and markdown extraction$0.005
sitemap_discoveredSitemap DiscoveredPer sitemap discovered and parsed for URL extraction$0.01

Example costs:

  • Crawl 25 pages (no sitemap): 25 × $0.005 = $0.125
  • Crawl 100 pages + sitemap: (100 × $0.005) + $0.01 = $0.51
  • Crawl 500 pages + sitemap: (500 × $0.005) + $0.01 = $2.51

Author

Built by John Rippy | johnrippy.link

🏆 2025 Zapier Automation Hero of the YearProject Phoenix: A 95-step AI sales pipeline cutting development time by 50%. Read more →


Keywords

website crawler, seo audit, sitemap crawler, markdown extraction, firecrawl alternative, llm content, web scraper, content extraction, technical seo