No credit card required

Website Content Crawler

apify/website-content-crawler

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Do you want to learn more about this Actor?

Get a demo

Back to issues Create new issue

The request queue hasn't had activity for 300s, resetting internal state

Closed

MavenAGI opened this issue

We have multiple crawls that are logging this message over and over again without making any progress.

Oscar Rodriguez (Oscardz)

This is an known issue that we are investigating at the moment. Once it's fixed, I will gladly reimburse the money for those Runs. Sorry for the inconvenience, and I will keep you posted on the progress of this fix.

MavenAGI

Thank you. It seems to be related to the 0.3.50 release as we didn't see it happening before then and most of our runs with that build have hit this issue. We're updating our code to request runs with the 0.3.49 build as a workaround.

Oscar Rodriguez (Oscardz)

Yes, you are right. There was a bug in the Request Queue, but it's been fixed in the latest version of the Actor. We apologize for the inconvenience, and you will be reimbursed for the spending accordingly.

reachable_mule

i have also had this issue - and the run consumed a large amount of $ credits. How do I request reimbursement?

Jiří Spilka (jiri.spilka)

I apologize for the delay. The issue has been fixed in the latest version, 0.3.52.

As for the reimbursement, I'm currently looking into it. Please allow me some time.

axstv

we're still seeing this issue as well in 0.3.52. with the same logs: The request queue hasn't had activity for 300s, resetting internal state

I assume that this will be reimbursed as well

Jiří Spilka (jiri.spilka)

@reachable_mule

i have also had this issue - and the run consumed a large amount of $ credits. How do I request reimbursement?

I got the information that it was reimbursed. Note that you need to increase the platform limit (Billing > Platform limit) by the reimbursed amount to make use of the reimbursement.

Jiří Spilka (jiri.spilka)

@axstv I’ve checked your runs and noticed the log The request queue hasn't had activity for 300s, resetting internal state, but this occurs only for a limited period of time. The other users faced it for hours.

The bigger issue seems to be with handling sitemaps. You're trying to retrieve two results, but the attempt to fetch the sitemaps is causing a block, and it takes around 15 minutes to actually start scraping.

I've created a new issue to track it. We’ll take a closer look. Please subscribe to the new issues. Let me close this issues and we will discuss everything in the new one.

In the meantime, if you don’t need to use sitemaps, could you retry with Consider URLs from Sitemaps set to False? I tried this and was able to get all the results in under 60 seconds.

reachable_mule

Yes I will try that and thank you very much for the amazing support.

Add comment

Developer

Apify

Actor Metrics

3.9k monthly users
711 stars
>99% runs succeeded
2.2 days response time
Created in Mar 2023
Modified 17 days ago

Categories

Developer tools

Fast Website Content Crawler

6sigmag/fast-website-content-crawler

A high-performance web scraper that rapidly extracts and analyzes content from multiple websites simultaneously. Perfect for competitive research, content aggregation, and website structure analysis.

David Deng

Deep Website Content Crawler

6sigmag/deep-website-content-crawler

Scrape Failed Killer! A high-performance web scraper that rapidly extracts and analyzes content from multiple websites simultaneously. Perfect for competitive research, content aggregation, and website structure analysis.

David Deng

AI Website Content Markdown Scraper

quaking_pail/ai-website-content-markdown-scraper

This Apify Actor, "Website Content Crawler with Markdown Extraction," is designed to perform a comprehensive crawl of specified websites, extract their text content, convert it into Markdown format, and store it in a structured dataset. The extracted content is suitable for feeding LLMs.

AI_Builder

214

Example Website Screenshot Crawler

dz_omar/example-website-screenshot-crawler

Automated website screenshot crawler using Pyppeteer and Apify. This open-source actor captures screenshots from specified URLs, uploads them to the Apify Key-Value Store, and provides easy access to the results, making it ideal for monitoring website changes and archiving web content.

Omar Abdlhakim

Google Maps Scraper

compass/crawler-google-places

Extract data from hundreds of Google Maps locations and businesses. Get Google Maps data including reviews, images, contact info, opening hours, location, popular times, prices & more. Export scraped data, run the scraper via API, schedule and monitor runs, or integrate with other tools.

Compass

80.2k

629

📩📍 Google Maps Email Extractor

lukaskrivka/google-maps-with-contact-details

Extract Google Maps contact details. Scrape websites of Google Maps places for contact details and get email addresses, website, location, address, zipcode, phone number, social media links. Export scraped data, run the scraper via API, schedule and monitor runs or integrate with other tools.

Lukáš Křivka

8.7k

223

Amazon Product Scraper

junglee/Amazon-crawler

Use this Amazon scraper to collect data based on URL and country from the Amazon website. Extract product information without using the Amazon API, including reviews, prices, descriptions, and Amazon Standard Identification Numbers (ASINs). Download data in various structured formats.

Junglee

7.5k

Instagram Scraper

apify/instagram-scraper

Scrape and download Instagram posts, profiles, places, hashtags, photos, and comments. Get data from Instagram using one or more Instagram URLs or search queries. Export scraped data, run the scraper via API, schedule and monitor runs or integrate with other tools.

Apify

61.9k

591

Facebook Posts Scraper

apify/facebook-posts-scraper

Extract data from hundreds of Facebook posts from one or multiple Facebook pages and profiles. Get post URL, post text, page or profile URL, timestamp, number of likes, shares, comments, and more. Download the data in JSON, CSV, and Excel and use it in apps, spreadsheets, and reports.

Apify

15.7k

204

Google Maps Reviews Scraper

compass/Google-Maps-Reviews-Scraper

Extract all reviews of Google Maps places using place URLs. Get review text, published date, response from owner, review URL, and reviewer's details. Download scraped data, run the scraper via API, schedule and monitor runs or integrate with other tools.

Compass

4.7k