Cheerio Scraper
No credit card required
Cheerio Scraper
No credit card required
Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.
Do you want to learn more about this Actor?
Get a demoFor such a request:
curl "https://api.apify.com/v2/acts/apify~cheerio-scraper/run-sync-get-dataset-items?token=" -X POST -H 'Content-Type: application/json' -d '{ "startUrls": [ { "url": "https://example.com/" }, { "url": "https://tonytong.mystrikingly.com/" } ], "linkSelector": "a[href]", "pageFunction": "async function pageFunction(context) { const { $, request, log } = context; const pageTitle = $("title").first().text(); const pageContent = $("body").html(); log.info("Page scraped", { url: request.url, pageTitle }); return { url: request.url, pageTitle, content: pageContent }; }", "proxyConfiguration": { "useApifyProxy": true }, "waitFor": true }'
Can I replace the body payload with the query param? As I'm trying to link this to a llm tool and its seems to have an issue handling a body.
Actor Metrics
443 monthly users
-
93 stars
>99% runs succeeded
25 days response time
Created in Apr 2019
Modified 2 months ago