data:image/s3,"s3://crabby-images/d573a/d573aa45a2da4a3434b6cefa08bd97a6e7f80f1f" alt="Cheerio Scraper avatar"
Cheerio Scraper
No credit card required
data:image/s3,"s3://crabby-images/d573a/d573aa45a2da4a3434b6cefa08bd97a6e7f80f1f" alt="Cheerio Scraper"
Cheerio Scraper
No credit card required
Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.
How to return response within the POST request.
I want the curl request to return the scraped data directly in the terminal within 1 request, but it only returns me settings from the run.
curl "https://api.apify.com/v2/acts/apify~cheerio-scraper/runs?token="
-X POST
-H 'Content-Type: application/json'
-d '{
"startUrls": [
{ "url": "https://example.com/" },
{ "url": "https://tonytong.mystrikingly.com/" }
],
"linkSelector": "a[href]",
"pageFunction": "async function pageFunction(context) { const { $, request, log } = context; const pageTitle = $("title").first().text(); const pageContent = $("body").html(); log.info("Page scraped", { url: request.url, pageTitle }); return { url: request.url, pageTitle, content: pageContent }; }",
"proxyConfiguration": {
"useApifyProxy": true
}
}'
Can you check and let me know how to change this query?
mr_apify
I found it, need to use run-sync-get-dataset-items in url
Actor Metrics
599 monthly users
-
120 bookmarks
>99% runs succeeded
36 days response time
Created in Apr 2019
Modified 4 months ago