Get Site to Markdown avatar
Get Site to Markdown

Pricing

Pay per usage

Go to Store
Get Site to Markdown

Get Site to Markdown

jhaisley/get-site

Developed by

b-w.pro

Maintained by Community

Website to Markdown Crawler An asynchronous web crawler that mirrors websites into a single organized markdown file, with handling for images and directory structure preservation. Designed to operate with low cost. This works great to build context for AI agents.

0.0 (0)

Pricing

Pay per usage

0

Monthly users

2

Runs succeeded

>99%

Last modified

19 days ago

Website to Markdown Crawler

An asynchronous web crawler that mirrors websites into a single organized markdown file, with special handling for images and proper directory structure preservation. Built with Python, asyncio, and httpx.

Author: Jordan Haisley (jordan@b-w.pro)

Features

  • 🚀 Fast asynchronous crawling using httpx and asyncio
  • 📁 Preserves site structure - can be limited to specific subdirectories
  • 🖼️ Smart image handling - preserves both alt text and filenames
  • 📝 Clean Markdown output with proper sectioning
  • 🔍 Depth-controlled crawling
  • 🔒 Domain-restricted recursive crawling for safety
  • 🤫 Quiet mode for silent operation

As an Apify Actor

Actor input schema:

1{
2    "start_urls": [{"url": "https://example.com"}],
3    "max_depth": 1
4}

Output Format

The generated markdown file contains:

  • A section for each page
  • Page title as heading
  • Original URL reference
  • Page content in Markdown format
  • Image references with both alt text and filenames

Example output:

1# Page Title
2*URL: https://example.com/page*
3
4![Alt text (File: image.jpg)](https://example.com/image.jpg)
5
6Page content in markdown...
7
8----------------

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.