Website Content Crawler avatar

Website Content Crawler

Try for free

No credit card required

Go to Store
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

CT

Trying to scrape the target URL for an "Apply" button

Closed
ctcoach opened this issue
25 days ago

I'm trying to apply this website. All of the data extracts well EXCEPT for the button "Apply." It resolves to an external website and I'm struggling to capture that target URL. Please help!

jakub.kopecky avatar

Hi, thank you for using the Website Content Crawler.

In this case, the issue occurred because the Readable Text transformer removed the button because it is an <a> tag in HTML. This transformer removes navigation elements by default. To avoid this, you can try setting the HTML transformer in HTML processing settings to None or, in the JSON input, set "htmlTransformer": "none".

See this example run.

Please let me know if you have any further questions. I’ll close this issue for now, but feel free to reach out if you need more help. I’d be happy to help. Jakub

Developer
Maintained by Apify

Actor Metrics

  • 5.5k monthly users

  • 999 bookmarks

  • >99% runs succeeded

  • 1.1 days response time

  • Created in Mar 2023

  • Modified 14 days ago