Website Content Crawler avatar

Website Content Crawler

Try for free

No credit card required

Go to Store
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

EI

cant scrape, my request has failed

Closed
engaging_integrity opened this issue
a month ago

I tried scraping a specific website but the request has failed, there were 0 output

jiri.spilka avatar

Hi, Thank you for using the Website Content Crawler. I had to tweak the crawler slightly to achieve the desired results.

Upon reviewing the target website (https://www.****.sg/), I noticed it contains job listings with structured data. For such cases, a better option might be to use the Web Scraper. However, if you'd like to extract all text using the Website Content Crawler, you can select the element containing the text—in this case, #joblist. Additionally, I enabled the Ignore canonical URLs option, as the URLs were incorrectly reported on the website. Here’s my example run with all the settings applied.

I hope this helps! I'll go ahead and close this issue for now, but please feel free to reach out if you need further clarification.

Best regards, Jiri

Developer
Maintained by Apify

Actor Metrics

  • 5.5k monthly users

  • 999 bookmarks

  • >99% runs succeeded

  • 1.1 days response time

  • Created in Mar 2023

  • Modified 14 days ago