Website Content Crawler
No credit card required
Website Content Crawler
No credit card required
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
Do you want to learn more about this Actor?
Get a demoI want to put Initial Cookies. To get that I am using EditThisCookie. I am putting the the cookies using the export method directly into the Initial Cookies box.
I am continuously getting siteMap can only be Lax|None|... etc. I changed it to Lax but I am still getting that same error. What am I doing wrong?
Hi, thank you for using Website Content Crawler.
In your case, the error message indicates that the cookies array at index 7 does not contain the expected string.
If you check your cookies, the 7th index (8th item) contains "sameSite": "lax"
in lowercase.
2024-12-16T13:15:48.743Z ERROR AdaptiveCrawler: Request failed and reached maximum retries. browserContext.addCookies: cookies[7].sameSite: expected one of (Strict|Lax|None)
When I made this change, it worked fine, and I was able to get all the results. Please see my example run.
However, for property listings, the Website Content Crawler may not be ideal, as all the results will be returned in one large blob of text.
If you know a bit of web scraping, a better option would be to use Web Scraper and get the data in a structured form, though this requires writing some code.
I hope this helps! Jiri
Hi, I’ll go ahead and close this issue for the Website Content Crawler now. However, feel free to ask any questions or raise a new issue if needed.
Thank you Jiří for the sample run, it was very helpful. I used Web Scraper actor, it worked very well. Thank you
OK. I'm glad that the Web Scraper Actor worked for you! J.
Actor Metrics
4k monthly users
-
840 stars
>99% runs succeeded
1 days response time
Created in Mar 2023
Modified 21 hours ago