Cheerio Scraper avatar

Cheerio Scraper

Try for free

No credit card required

Go to Store
Cheerio Scraper

Cheerio Scraper

apify/cheerio-scraper
Try for free

No credit card required

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

Do you want to learn more about this Actor?

Get a demo
MA

Is there a way to run the request entirely using query param and no body?

Open

mr_apify opened this issue
a month ago

For such a request:

curl "https://api.apify.com/v2/acts/apify~cheerio-scraper/run-sync-get-dataset-items?token=" -X POST -H 'Content-Type: application/json' -d '{ "startUrls": [ { "url": "https://example.com/" }, { "url": "https://tonytong.mystrikingly.com/" } ], "linkSelector": "a[href]", "pageFunction": "async function pageFunction(context) { const { $, request, log } = context; const pageTitle = $("title").first().text(); const pageContent = $("body").html(); log.info("Page scraped", { url: request.url, pageTitle }); return { url: request.url, pageTitle, content: pageContent }; }", "proxyConfiguration": { "useApifyProxy": true }, "waitFor": true }'

Can I replace the body payload with the query param? As I'm trying to link this to a llm tool and its seems to have an issue handling a body.

Developer
Maintained by Apify

Actor Metrics

  • 443 monthly users

  • 93 stars

  • >99% runs succeeded

  • 25 days response time

  • Created in Apr 2019

  • Modified 2 months ago