Dynamic Markdown Scraper
2 hours trial then $19.00/month - No credit card required now
Dynamic Markdown Scraper
2 hours trial then $19.00/month - No credit card required now
Effortlessly feed LLM AIs with clean Markdown using our advanced web scraper. Seamlessly scrape dynamic, JavaScript-rendered websites while preserving original formatting. Ideal for AI training, documentation, and content migration.
A powerful web scraper that converts difficult to scrape web pages into clean, well-formatted Markdown content. This scraper crawls websites and automatically transforms their HTML content into Markdown format while maintaining the original structure and formatting. It handles dynamic content and JavaScript-rendered pages with ease.
Features
- Crawls websites and converts content to Markdown format
- Maintains proper heading structure, lists, and code blocks
- Handles dynamic content and JavaScript-rendered pages
- Handles images and links correctly
- Respects same-domain crawling
- Filters out unwanted content (navigation, footers, etc.)
- Configurable maximum crawl limits
- Smart content extraction focusing on main article content
- Built with TypeScript for better maintainability
Use Cases
- Feed website content to LLM AI for further processing
- Extract content from websites for documentation, blog posts, or technical writing
- Scrape and convert web pages for use in static sites, blogs, or other projects
- Automate content migration from legacy systems to modern platforms
Input Configuration
The scraper accepts the following input parameters:
startUrls
: Array of URLs where the crawler should begin (required)maxRequestsPerCrawl
: Maximum number of pages to crawl (optional, defaults to unlimited)
Example input:
1{ 2 "startUrls": [ 3 { "url": "https://apify.com" } 4 ], 5 "maxRequestsPerCrawl": 100 6}
Output Format
The scraper saves the following data for each processed page:
url
: The URL of the scraped pagetitle
: Page titlemarkdown
: Converted Markdown contentcapturedAt
: Timestamp of when the page was scraped
Example output:
1{ 2 "url": "https://apify.com/storage", 3 "title": "Storage optimized for scraping · Apify", 4 "markdown": "# Apify Storage\n\nScalable and reliable cloud data storage designed for web scraping and automation workloads.\n\n[View documentation](https://docs.apify.com/platform/storage)\n\nBenefits\n\n## Specialized storage from Apify[](https://apify.com/storage#specialized-storage-from-apify)\n\n![Enterprise_grade_reliability_performance_and_scalability_9890860f85.svg](https://cdn-cms.apify.com/Enterprise_grade_reliability_performance_and_scalability_9890860f85.svg)\n\n### Enterprise-grade reliability, performance, and scalability[](https://apify.com/storage#enterprise-grade-reliability-performance-and-scalability)\n\nStore a few records or a few hundred million, with the same low latency and high reliability. We use Amazon Web Services for the underlying data storage, giving you high availability and peace of mind.\n\n### Low-cost storage for web scraping and crawling[](https://apify.com/storage#low-cost-storage-for-web-scraping-and-crawling)\n\nApify provides low-cost storage carefully designed for the large workloads typical of web scraping and crawling operations.\n\n![Low_cost_storage_for_web_scraping_and_crawling_b313f7d95e.svg](https://cdn-cms.apify.com/Low_cost_storage_for_web_scraping_and_crawling_b313f7d95e.svg)\n\n![Easy_to_use_634e40ae76.svg](https://cdn-cms.apify.com/Easy_to_use_634e40ae76.svg)\n\n### Easy to use[](https://apify.com/storage#easy-to-use)\n\nData can be viewed on the web, giving you a quick way to review and share it with other people. The Apify [API](https://docs.apify.com/api/v2) and [SDK](https://docs.apify.com/sdk/js/) makes it easy to integrate our storage into your apps.\n\nFeatures\n\n## We’ve got you covered[](https://apify.com/storage#weve-got-you-covered)\n\n[![Dataset_78dfe4e3a4.svg](https://cdn-cms.apify.com/Dataset_78dfe4e3a4.svg)\n\n**Dataset** \nStore results from your web scraping, crawling or data processing jobs into Apify datasets and export them to various formats like JSON, CSV, XML, RSS, Excel or HTML.\n\n\n\n\n\n](https://docs.apify.com/platform/storage/dataset)[![Request_queue_9e9602319e.svg](https://cdn-cms.apify.com/Request_queue_9e9602319e.svg)\n\n**Request queue** \nMaintain a queue of URLs of web pages in order to recursively crawl websites, starting from initial URLs and adding new links as they are found while skipping duplicates.\n\n\n\n\n\n](https://docs.apify.com/platform/storage/request-queue)[![Key_value_store_bc65220b7d.svg](https://cdn-cms.apify.com/Key_value_store_bc65220b7d.svg)\n\n**Key-value store** \nStore arbitrary data records along with their MIME content type. The records are accessible under a unique name and can be written and read at a rapid rate.\n\n\n\n\n\n](https://docs.apify.com/platform/storage/key-value-store)\n\n## Ready to build your first Actor?[](https://apify.com/storage#ready-to-build-your-first-actor)\n\n[Start developing](https://apify.com/templates)", 5 "capturedAt": "2025-01-23T14:01:21.956Z" 6}
Actor Metrics
1 monthly user
-
1 star
Created in Jan 2025
Modified 11 hours ago