Pricing

Pay per usage

Go to Store

Website Content Crawler

Try for free

apify/website-content-crawler

Developed by

Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

4.6 (38)

Pricing

Pay per usage

1.1k

Monthly users

Runs succeeded

>99%

Response time

2.3 days

Last modified

7 days ago

AI Developer tools

Back to issues Create new issue

issue in one run

Closed

The25th opened this issue

there is an issue, which make the actor run for 7 hours continuously, but the input was clearly instructed to crawel one URL.

see attachements.
the run reference : https://console.apify.com/actors/runs/SeNoaq8duRjsCsqfm#input

Jakub Kopecký (jakub.kopecky)

Hey, thank you for using the Website Content Crawler!

Sorry for the late response. Based on the INPUT.rtf provided, the maxCrawlPages is set to 9999999. If you want to scrape strictly only a single page, you can set this value to 1.

Jakub

Add comment

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.

Fast Website Content Crawler

6sigmag/fast-website-content-crawler

A high-performance web scraper that rapidly extracts and analyzes content from multiple websites simultaneously. Perfect for competitive research, content aggregation, and website structure analysis.

David Deng

475

5.0

Deep Website Content Crawler

6sigmag/deep-website-content-crawler

Scrape Failed Killer! A high-performance web scraper that rapidly extracts and analyzes content from multiple websites simultaneously. Perfect for competitive research, content aggregation, and website structure analysis.

David Deng

224

AI Website Content Markdown Scraper

quaking_pail/ai-website-content-markdown-scraper

This Apify Actor, "Website Content Crawler with Markdown Extraction," is designed to perform a comprehensive crawl of specified websites, extract their text content, convert it into Markdown format, and store it in a structured dataset. The extracted content is suitable for feeding LLMs.

AI_Builder

460

/llms.txt Generator

jakub.kopecky/llmstxt-generator

The /llms.txt Generator 🕸️📄 extracts website content to create an llms.txt file for AI apps 🤖✨ like LLM fine-tuning and indexing. Output is available 📥 in the Key-Value Store for easy download and integration into workflows. 🚀

Jakub Kopecký

5.0

Webpage Singer 🎶

josef.prochazka/webpage-singer

Ever wondered what a website would sound like as a song? This Actor takes any webpage, turns its content into lyrics, and transforms it into a track in your favorite genre. Just drop in a URL, pick a style, and let the AI do the rest.

Josef Procházka

5.0

Backlink Opportunity Finder

easyapi/backlink-opportunity-finder

🔍 Discover high-quality backlink opportunities to boost your domain authority and search rankings. Extract valuable data about potential websites for building authoritative backlinks, including domain metrics, relevance analysis, and estimated SEO impact.

EasyApi

Example Website Screenshot Crawler

dz_omar/example-website-screenshot-crawler

Automated website screenshot crawler using Pyppeteer and Apify. This open-source actor captures screenshots from specified URLs, uploads them to the Apify Key-Value Store, and provides easy access to the results, making it ideal for monitoring website changes and archiving web content.

Abdlhakim hefaia

Cheerio Scraper

apify/cheerio-scraper

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

Apify

6.9k

4.7

Puppeteer Scraper

apify/puppeteer-scraper

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

Apify

5.9k

5.0

Web Scraper

apify/web-scraper

Crawls arbitrary websites using the Chrome browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

Apify

79.2k

4.5