Website Content Crawler
No credit card required
Website Content Crawler
No credit card required
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
Do you want to learn more about this Actor?
Get a demoit only crawls one page. Tried increasing the max depth to 10, tried also crawler type (adaptive and firefox)
Hi, thank you for using Website Content Crawler!
The issue is caused by the scope of your startURLs. Many customers choose to crawl only the specified startURLs and their sub-pages. In your case, with https://www.****startseite.html, the scope is limited to this page and its sub-pages. Since there are no sub-pages, the crawler doesn't proceed further.
To fix this, please remove startseite.html from the URL so the crawler can access other pages as well. You can see an example run, which I aborted early to avoid wasting resources.
I hope this helps! I'll close this issue now, but feel free to ask any additional questions or raise a new issue.
thank you for the quick reply but now we have https://console.apify.com/organization/PljXs4KlVGTIiQCKc/actors/runs/CON38M0HYT7BwZNuE#output this error
2024-12-05T14:49:22.447Z INFO AdaptiveCrawler: Running browser request handler for https://www.pema-tec.multiscreensite.com/ 2024-12-05T14:49:25.726Z ERROR AdaptiveCrawler: Request failed and reached maximum retries. page.goto: SSL_ERROR_UNKNOWN
any idea why is this happening?
I'm glad I could help.
The site https://www.pema-tec.multiscreensite.com/
is not reachable. When I attempted to access it, I encountered a Site not found
error.
However, it appears that the issue lies with the www
prefix. If you use https://pema-tec.multiscreensite.com/
(without www
) as the start URL, the site works correctly.
I’ll go ahead and close this issue now, but please feel free to ask additional questions or raise a new issue.
Actor Metrics
3.9k monthly users
-
718 stars
>99% runs succeeded
2.2 days response time
Created in Mar 2023
Modified 15 hours ago