Pricing

Pay per usage

Try for free

Go to Store

Cheerio Scraper

Try for free

Developed by

Apify

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

4.7 (10)

Pricing

Pay per usage

177

Total users

9.1K

Monthly users

918

Runs succeeded

>99%

Issues response

12 days

Last modified

2 months ago

Developer tools

Open source

Back to issues Create new issue

can you scrape this website? https://therealdeal.com/miami/

Closed

andruchiii opened this issue

I would like to get the article link, article publish date and article title. Is it possible?

Jindřich Bär (jindrich.bar)

Hi, thanks for your question!

Yes, it's possible to scrape data from https://therealdeal.com/miami/. I tested it using the Cheerio Scraper by extracting the __NEXT_DATA__ element, which contains the article metadata in a structured format.

The Page function should look something like this:

async function pageFunction(context) {
    const { $ } = context;

    const nextDataScript = $('#__NEXT_DATA__').html();
    const nextData = JSON.parse(nextDataScript);

    return [
        ...nextData.props.pageProps.data.editorialPickPosts,
        ...nextData.props.pageProps.data.posts.nodes
    ];
}

You can check out my test run here: https://console.apify.com/view/runs/IuBxlWpfxWfPZpItN . Feel free to copy my Actor input and customize it to fit your use case.

I'll close this issue now, but feel free to ask additional questions if you have any. Cheers!

Add comment

Puppeteer Scraper

apify/puppeteer-scraper

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

Apify

8.4K

5.0

Web Scraper

apify/web-scraper

Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

Apify

90K

4.4

BeautifulSoup Scraper

apify/beautifulsoup-scraper

Crawls websites using raw HTTP requests. It parses the HTML with the BeautifulSoup library and extracts data from the pages using Python code. Supports both recursive crawling and lists of URLs. This Actor is a Python alternative to Cheerio Scraper.

Apify

870

4.2

Vanilla JS Scraper

mstephen190/vanilla-js-scraper

Scrape the web using familiar JavaScript methods! Crawls websites using raw HTTP requests, parses the HTML with the JSDOM package, and extracts data from the pages using Node.js code. Supports both recursive crawling and lists of URLs. This actor is a non jQuery alternative to CheerioScraper.

Matthias Stephens

471

Playwright Scraper

apify/playwright-scraper

Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

Apify

3.6

HTML Scraper pro

scrapingxpert/html-scraper-pro

The HTML Scraper Pro is a powerful tool designed to extract the HTML source code and metadata from websites. It uses advanced web scraping techniques to retrieve the full HTML content of web pages,page title and HTTP status code.This tool is ideal for data extraction, website analysis, and archiving

scrapingxpert

100

Camoufox Scraper

apify/camoufox-scraper

Crawls websites with stealthy Camoufox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

Apify

Javascript Library Detail Scraper

cykieffodh/javascript-library-detail-scraper

Javascript Library Detail Scraper

Michael Laflin

JSDOM Scraper

apify/jsdom-scraper

Parses the HTML using the JSDOM library, providing the same DOM API as browsers do (e.g. `window`). It is able to process client-side JavaScript without using a real browser. Performance-wise, it stands somewhere between the Cheerio Scraper and the browser scrapers.

Apify

4.3

Nodejs Runner

martin.forejt/nodejs-runner

This Actor allows you to quickly run arbitrary JavaScript code in a real Node.js environment, making it ideal for testing, debugging, or executing small scripts without setting up a local Node.js instance. The actor spawns a separate Node.js process to run the provided code and captures the logs.