Website Content Crawler avatar

Website Content Crawler

Try for free

No credit card required

Go to Store
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Do you want to learn more about this Actor?

Get a demo
CX

I am having trouble putting in Cookies as input.

Closed

chirag_xcvii opened this issue
a month ago

I want to put Initial Cookies. To get that I am using EditThisCookie. I am putting the the cookies using the export method directly into the Initial Cookies box.

I am continuously getting siteMap can only be Lax|None|... etc. I changed it to Lax but I am still getting that same error. What am I doing wrong?

jiri.spilka avatar

Hi, thank you for using Website Content Crawler.

In your case, the error message indicates that the cookies array at index 7 does not contain the expected string.

If you check your cookies, the 7th index (8th item) contains "sameSite": "lax" in lowercase.

2024-12-16T13:15:48.743Z ERROR AdaptiveCrawler: Request failed and reached maximum retries. browserContext.addCookies: cookies[7].sameSite: expected one of (Strict|Lax|None)

When I made this change, it worked fine, and I was able to get all the results. Please see my example run.

However, for property listings, the Website Content Crawler may not be ideal, as all the results will be returned in one large blob of text.

If you know a bit of web scraping, a better option would be to use Web Scraper and get the data in a structured form, though this requires writing some code.

I hope this helps! Jiri

jiri.spilka avatar

Hi, I’ll go ahead and close this issue for the Website Content Crawler now. However, feel free to ask any questions or raise a new issue if needed.

CX

chirag_xcvii

23 days ago

Thank you Jiří for the sample run, it was very helpful. I used Web Scraper actor, it worked very well. Thank you

jiri.spilka avatar

OK. I'm glad that the Web Scraper Actor worked for you! J.

Developer
Maintained by Apify

Actor Metrics

  • 4k monthly users

  • 840 stars

  • >99% runs succeeded

  • 1 days response time

  • Created in Mar 2023

  • Modified 21 hours ago