Cheerio Scraper
No credit card required
Cheerio Scraper
No credit card required
Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.
Do you want to learn more about this Actor?
Get a demoI want the curl request to return the scraped data directly in the terminal within 1 request, but it only returns me settings from the run.
curl "https://api.apify.com/v2/acts/apify~cheerio-scraper/runs?token="
-X POST
-H 'Content-Type: application/json'
-d '{
"startUrls": [
{ "url": "https://example.com/" },
{ "url": "https://tonytong.mystrikingly.com/" }
],
"linkSelector": "a[href]",
"pageFunction": "async function pageFunction(context) { const { $, request, log } = context; const pageTitle = $("title").first().text(); const pageContent = $("body").html(); log.info("Page scraped", { url: request.url, pageTitle }); return { url: request.url, pageTitle, content: pageContent }; }",
"proxyConfiguration": {
"useApifyProxy": true
}
}'
Can you check and let me know how to change this query?
I found it, need to use run-sync-get-dataset-items in url
Actor Metrics
442 monthly users
-
93 stars
>99% runs succeeded
28 days response time
Created in Apr 2019
Modified 2 months ago