Website Content Crawler avatar

Website Content Crawler

Try for free

No credit card required

Go to Store
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

YE

crawling takes longer

Closed
yener.yasin030 opened this issue
a month ago

Crawling crawles unimportant things. and takes more then a hour. its eat my money iwasnt on the pc.

jiri.spilka avatar

Hi, thank you for using the Website Content Crawler.

However, I couldn’t find the runId you attached.
I reviewed your runs, and the longest one took ~20 minutes. While it was a bit slow, scraping pages like news.google can be challenging.

Could you please provide the runId where you encountered the issue?

Thank you, Jiri

YE

yener.yasin030

a month ago

I have deleted it falsewise the last run cost me 12 dolar something like that...

22 Oca 2025 Çar 12:56 tarihinde Jiří Spilka notifications@apify.com şunu yazdı:

jiri.spilka avatar

I found the runId: rBiWKBvKfnA6T6bgh in the database, and it was indeed deleted along with the corresponding dataset.

The last status message I can see is:
Crawled 892/4700 pages, 0 failed requests, desired concurrency 2.

However, I'm unable to debug further as I don’t have access to all the necessary information.

Please confirm, if I can restore (undelete) runId: rBiWKBvKfnA6T6bgh so I can investigate further?

YE

yener.yasin030

a month ago

Please restore it i confirm it.

22 Oca 2025 Çar 14:00 tarihinde Jiří Spilka notifications@apify.com şunu yazdı:

jiri.spilka avatar

Thank you. It looks like you started the Website Content Crawler with approximately 50 URLs, such as this one but did not specify a maxCrawlDepth. Additionally, the crawler automatically enqueued other links, including those with query parameters, which might have increased the scope of the crawl.

We understand this may have been an oversight. We've reimbursed $6 to your account as a compensation.

When you are starting the Actor with a list of URLs that you want to scrape, always set maxCrawlDepth=0

I'll go ahead and close this issue for now, but please feel free to ask another questions. Jiri

Developer
Maintained by Apify

Actor Metrics

  • 5.5k monthly users

  • 999 bookmarks

  • >99% runs succeeded

  • 1.1 days response time

  • Created in Mar 2023

  • Modified 14 days ago