Get Site to Markdown avatar
Get Site to Markdown

Pricing

Pay per usage

Go to Store
Get Site to Markdown

Get Site to Markdown

Developed by

b-w.pro

b-w.pro

Maintained by Community

Website to Markdown Crawler An asynchronous web crawler that mirrors websites into a single organized markdown file, with handling for images and directory structure preservation. Designed to operate with low cost. This works great to build context for AI agents.

0.0 (0)

Pricing

Pay per usage

1

Total users

10

Monthly users

3

Runs succeeded

96%

Last modified

2 months ago

Website to Markdown Crawler

An asynchronous web crawler that mirrors websites into a single organized markdown file, with special handling for images and proper directory structure preservation. Built with Python, asyncio, and httpx.

Author: Jordan Haisley (jordan@b-w.pro)

Features

  • 🚀 Fast asynchronous crawling using httpx and asyncio
  • 📁 Preserves site structure - can be limited to specific subdirectories
  • 🖼️ Smart image handling - preserves both alt text and filenames
  • 📝 Clean Markdown output with proper sectioning
  • 🔍 Depth-controlled crawling
  • 🔒 Domain-restricted recursive crawling for safety
  • 🤫 Quiet mode for silent operation

As an Apify Actor

Actor input schema:

{
"start_urls": [{"url": "https://example.com"}],
"max_depth": 1
}

Output Format

The generated markdown file contains:

  • A section for each page
  • Page title as heading
  • Original URL reference
  • Page content in Markdown format
  • Image references with both alt text and filenames

Example output:

# Page Title
*URL: https://example.com/page*
![Alt text (File: image.jpg)](https://example.com/image.jpg)
Page content in markdown...
----------------