Website Content Crawler avatar
Website Content Crawler
Try for free

No credit card required

View all Actors
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗LangChain, LlamaIndex, and the wider LLM ecosystem.

MR

Issue : Bad request

Open

MarketingSkills opened this issue
2 months ago

Hi, I'm trying to run Website Content Crawler in Clay. I 've add all the informations in the input data field in Clay : "{ "startUrls": [ {"url": " page clients "} ], "maxCrawlDepth": 2, "maxPagesPerCrawl": 100, "linkSelector": "a", "pageFunction": "async function pageFunction(context) { const { $, request, log } = context; log.info(URL: ${request.url}); const content = $('body').text(); log.info(Content: ${content}); return { url: request.url, content }; }" } " I have an error Bad Request. I assume that it's the code i've added in the filed that is not correct. Do you have a guide or someone who can tell me what's wrong with it ?

Attached is here is an image of clay. The code i've sent to you is in the input data field (right). and you have the url i'm calling, in the column "page clients" as you can see, i Run the 10 first line. Run condition not meet is due to this "i ask clay to only run if {{page clients}} is not X. for the one which has ran i've an error. Bad Request. No more informations about it

Many thanks in advance . CG

jindrich.bar avatar

Hello and thank you for your interest in this Actor!

It looks like the issue might be related to how the input data is structured or possibly an issue with Apify account memory. We're already in touch with Clay - they will soon display better error messages. In the meantime, you can find the actual error messages in the browser Dev Console.

Here are a few basic things to check:

  • Account Memory: If your Apify account is running out of memory, this could be causing the "Bad Request" error. Try starting with only a few rows at once to see if that resolves the issue.
  • Input Data Structure: Ensure that the JSON input is correctly formatted and that all required fields are properly set. Here’s a refined version of your input data:
1{
2  "startUrls": [{"url": "https://example.com/page-clients"}],
3  "maxCrawlDepth": 2,
4  "maxPagesPerCrawl": 100,
5  "linkSelector": "a",
6  "pageFunction": "async function pageFunction(context) { const { $, request, log } = context; log.info(`URL: ${request.url}`); const content = $('body').text(); log.info(`Content: ${content}`); return { url: request.url, content }; }"
7}

This seems a bit suspicious, as the Website Content Crawler does not accept a pageFunction input parameter. You can copy the correct input configuration as JSON from the Input tab in here (Apify Platform) - toggle the Regular | JSON switch and simply copy the contents from the text area.

Unfortunately, I currently cannot provide more support - but we'll let you know if Clay pushes any updates. Feel free to ask more questions - or close this issue.

Cheers!

Developer
Maintained by Apify
Actor metrics
  • 2.8k monthly users
  • 317 stars
  • 100.0% runs succeeded
  • 4 days response time
  • Created in Mar 2023
  • Modified 1 day ago