Pricing

Pay per usage

Try for free

Go to Apify Store

JSDOM Scraper

Try for free

Parses the HTML using the JSDOM library, providing the same DOM API as browsers do (e.g. `window`). It is able to process client-side JavaScript without using a real browser. Performance-wise, it stands somewhere between the Cheerio Scraper and the browser scrapers.

Pricing

Pay per usage

Rating

4.3

(3)

Developer

Apify

Actor stats

Bookmarked

147

Total users

Monthly active users

2 months ago

Last modified

Categories

Developer tools

Open source

You can access the JSDOM Scraper programmatically from your own applications by using the Apify API. You can also choose the language preference from below. To use the Apify API, you’ll need an Apify account and your API token, found in Integrations settings in Apify Console.

Python

JavaScript

CLI

OpenAPI

HTTP

MCP

# Set API token
$API_TOKEN=<YOUR_API_TOKEN>

# Prepare Actor input
$cat > input.json << 'EOF'
<{
<  "startUrls": [
<    {
<      "url": "https://crawlee.dev/js"
<    }
<  ],
<  "respectRobotsTxtFile": true,
<  "globs": [
<    {
<      "glob": "https://crawlee.dev/js/*/*"
<    }
<  ],
<  "pseudoUrls": [],
<  "excludes": [
<    {
<      "glob": "/**/*.{png,jpg,jpeg,pdf}"
<    }
<  ],
<  "linkSelector": "a[href]",
<  "runScripts": false,
<  "pageFunction": "async function pageFunction(context) {\n    const { window, request, log } = context;\n\n    // The \"window\" property contains the JSDOM object which is useful\n    // for querying DOM elements and extracting data from them.\n    const pageTitle = window.document.title;\n\n    // The \"request\" property contains various information about the web page loaded. \n    const url = request.url;\n    \n    // Use \"log\" object to print information to Actor log.\n    log.info('Page scraped', { url, pageTitle });\n\n    // Return an object with the data extracted from the page.\n    // It will be stored to the resulting dataset.\n    return {\n        url,\n        pageTitle\n    };\n}",
<  "proxyConfiguration": {
<    "useApifyProxy": true
<  },
<  "initialCookies": [],
<  "additionalMimeTypes": [],
<  "preNavigationHooks": "// We need to return array of (possibly async) functions here.\n// The functions accept two arguments: the \"crawlingContext\" object\n// and \"requestAsBrowserOptions\" which are passed to the `requestAsBrowser()`\n// function the crawler calls to navigate..\n[\n    async (crawlingContext, requestAsBrowserOptions) => {\n        // ...\n    }\n]",
<  "postNavigationHooks": "// We need to return array of (possibly async) functions here.\n// The functions accept a single argument: the \"crawlingContext\" object.\n[\n    async (crawlingContext) => {\n        // ...\n    },\n]",
<  "customData": {}
<}
<EOF

# Run the Actor using an HTTP API
# See the full API reference at https://docs.apify.com/api/v2
$curl "https://api.apify.com/v2/acts/apify~jsdom-scraper/runs?token=$API_TOKEN" \
<  -X POST \
<  -d @input.json \
<  -H 'Content-Type: application/json'

JSDOM Scraper API

Below, you can find a list of relevant HTTP API endpoints for calling the JSDOM Scraper Actor. For this, you’ll need an Apify account. Replace <YOUR_API_TOKEN> in the URLs with your Apify API token, which you can find under Integrations in Apify Console. For details, see the API reference.

Run Actor

POST
https://api.apify.com/v2/acts/apify~jsdom-scraper/runs?token=<YOUR_API_TOKEN>

Note: By adding the method=POST query parameter, this API endpoint can be called using a GET request and thus used in third-party webhooks. Please refer to our Run Actor API documentation.

Run Actor synchronously and get dataset items

POST
https://api.apify.com/v2/acts/apify~jsdom-scraper/run-sync-get-dataset-items?token=<YOUR_API_TOKEN>

Note: This endpoint supports both POST and GET request methods. However, only the POST method allows you to pass input data. For more information, please refer to our Run Actor synchronously and get dataset items API documentation.

Get Actor

GET
https://api.apify.com/v2/acts/apify~jsdom-scraper?token=<YOUR_API_TOKEN>

For more information, please refer to our Get Actor API documentation.

Actors can be used to scrape web pages, extract data, or automate browser tasks. Use the JSDOM Scraper API programmatically via the Apify API.

You can choose from:

JSDOM Scraper API in Python

JSDOM Scraper API in JavaScript

JSDOM Scraper API through CLI

JSDOM Scraper OpenAPI definition

You can start JSDOM Scraper with the Apify API by sending an HTTP POST request to the Run Actorendpoint. An Actor’s input and its content type can be passed as a payload of the POST request, and additional options can be specified using URL query parameters. The JSDOM Scraper is identified within the API by its ID, which is the creator’s username and the name of the Actor.

When the JSDOM Scraper run finishes you can list the data from its default dataset(storage) via the API or you can preview the data directly on Apify Console.

Cheerio Scraper

apify/cheerio-scraper

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

Apify

16K

4.5

Vanilla JS Scraper

mstephen190/vanilla-js-scraper

Scrape the web using familiar JavaScript methods! Crawls websites using raw HTTP requests, parses the HTML with the JSDOM package, and extracts data from the pages using Node.js code. Supports both recursive crawling and lists of URLs. This actor is a non jQuery alternative to CheerioScraper.

Matthias Stephens

519

Price Drop Tracker - Monitor Any E-commerce Product

alizarin_refrigerator-owner/price-drop-tracker---monitor-any-e-commerce-product

Actor for scraping data from a single web page. The URL of the web page is passed in via input, defined by the input schema. It uses the Axios client to get the HTML of the page & the Cheerio library to parse the data from it. The data are then stored in a dataset where you can easily access them.

The Howlers

Reviewsio Reviews Scraper

getdataforme/reviewsio-reviews-scraper

Project Cheerio Crawler Javascript is a robust web scraping tool using the Cheerio library to extract structured data efficiently from websites....

GetDataForMe

BeautifulSoup Scraper

apify/beautifulsoup-scraper

Crawls websites using raw HTTP requests. It parses the HTML with the BeautifulSoup library and extracts data from the pages using Python code. Supports both recursive crawling and lists of URLs. This Actor is a Python alternative to Cheerio Scraper.

Apify

991

4.2

Website Job Extractor (Browser)

santamaria-automations/website-job-extractor-browser

Extract job listings from JavaScript-rendered career pages (React, Vue, Angular) using AI + Playwright. Companion to the HTTP-only Website Job Extractor. Use it for the ~28% of company sites that need a real browser. Same output format, same quality, same LLM fallback chain.

Alessandro Santamaria

Website Contact Extractor (Browser)

santamaria-automations/website-contact-extractor-browser

Extract team contacts from JavaScript-rendered company websites (React, Vue, Angular) using AI + Playwright. Companion to the HTTP-only Website Contact Extractor. Handles the ~28% of sites that need a real browser. Same output format, same quality, same LLM fallback chain.

Alessandro Santamaria

Noon Product Info Scraper

getdataforme/noon-productInfo-scraper

Project Cheerio Crawler Typescript is a web scraping tool that extracts detailed product data from e-commerce sites using the Cheerio library....

GetDataForMe

Metadata Extractor

jancurn/extract-metadata

A small efficient actor that loads a web page, parses its HTML using Cheerio library and extracts the following meta-data from the <HEAD> tag, such as page title, description, author etc.

Jan Čurn

1.4K

Newbalance Reviews Scraper

getdataforme/newbalance-reviews-scraper

Project Cheerio Crawler Typescript is a web scraping tool using the Cheerio library to extract structured data from websites. It offers customizable crawling, scalable performance, robust error handling, and exports data in JSON, CSV, or Excel....

GetDataForMe