Vanilla JS Scraper

Pricing

Pay per usage

Vanilla JS Scraper

Scrape the web using familiar JavaScript methods! Crawls websites using raw HTTP requests, parses the HTML with the JSDOM package, and extracts data from the pages using Node.js code. Supports both recursive crawling and lists of URLs. This actor is a non jQuery alternative to CheerioScraper.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Matthias Stephens

Maintained by Community

Actor stats

Bookmarked

503

Total users

Monthly active users

22 days

Issues response

2 years ago

Last modified

Requests

requestsarrayRequired

A static list of URLs to scrape.

For details, see the Start URLs section in the README.

Pseudo-URLs

pseudoUrlsarrayOptional

Specifies what kind of URLs found by the Link selector should be added to the request queue. A pseudo-URL is a URL with regular expressions enclosed in [] brackets, e.g. http://www.example.com/[.*].

If Pseudo-URLs are omitted, the actor enqueues all links matched by the Link selector.

For details, see Pseudo-URLs in README.

Default value of this property is []

Link selector

linkSelectorstringOptional

A CSS selector stating which links on the page (<a> elements with href attribute) shall be followed and added to the request queue. To filter the links added to the queue, use the Pseudo-URLs field.

If the Link selector is empty, the page links are ignored.

For details, see the Link selector in README.

Page function

pageFunctionstringRequired

A JavaScript function that is executed for every page loaded server-side in Node.js 12. Use it to scrape data from the page, perform actions or add new URLs to the request queue.

For details, see Page function in README.

Pre-navigation hooks

preNavigationHooksstringOptional

Async functions that are sequentially evaluated before the navigation. Good for setting additional cookies or browser properties before navigation. The function accepts two parameters, crawlingContext and requestAsBrowserOptions, which are passed to the requestAsBrowser() function the crawler calls to navigate.

Post-navigation hooks

postNavigationHooksstringOptional

Async functions that are sequentially evaluated after the navigation. Good for checking if the navigation was successful. The function accepts crawlingContext as the only parameter.

Proxy configuration

proxyobjectOptional

Specifies proxy servers that will be used by the scraper in order to hide its origin.

For details, see Proxy configuration in README.

Default value of this property is {"useApifyProxy":false}

Debug log

debugbooleanOptional

Include debug messages in the log?

Default value of this property is false

Max concurrency

maxConcurrencyintegerOptional

Specifies the maximum number of pages that can be processed by the scraper in parallel. The scraper automatically increases and decreases concurrency based on available system resources. This option enables you to set an upper limit, for example to reduce the load on a target web server.

Default value of this property is 50

Max request retries

maxRequestRetriesintegerOptional

The maximum number of times the scraper will retry to load each web page on error, in case of a page load error or an exception thrown by the Page function.

If set to 0, the page will be considered failed right after the first error.

Default value of this property is 3

Page load timeout

pageLoadTimeoutSecsintegerOptional

The maximum amount of time the scraper will wait for a web page to load, in seconds. If the web page does not load in this timeframe, it is considered to have failed and will be retried (subject to Max page retries), similarly as with other page load errors.

Default value of this property is 60

Page function timeout

pageFunctionTimeoutSecsintegerOptional

The maximum amount of time the scraper will wait for the Page function to execute, in seconds. It is always a good idea to set this limit, to ensure that unexpected behavior in page function will not get the scraper stuck.

Default value of this property is 60

Ignore SSL errors

ignoreSslErrorsbooleanOptional

If enabled, the scraper will ignore SSL/TLS certificate errors. Use at your own risk.

Default value of this property is false

Additional MIME types

additionalMimeTypesarrayOptional

A JSON array specifying additional MIME content types of web pages to support. By default, Cheerio Scraper supports the text/html and application/xhtml+xml content types, and skips all other resources. For details, see Content types in README.

Default value of this property is []

Dataset name

datasetNamestringOptional

Name or ID of the dataset that will be used for storing results. If left empty, the default dataset of the run will be used.

Key-value store name

keyValueStoreNamestringOptional

Name or ID of the key-value store that will be used for storing records. If left empty, the default key-value store of the run will be used.

Custom data

customDataobjectOptional

A custom JSON object that is passed to the Page function as context.customData. This setting is useful when invoking the scraper via API, in order to pass some arbitrary parameters to your code.

Default value of this property is {}

Cheerio Scraper

apify/cheerio-scraper

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

Apify

12K

5.0

Puppeteer Scraper

apify/puppeteer-scraper

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

Apify

11K

4.9

Web Scraper

apify/web-scraper

Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

Apify

99K

4.7

Nodejs Runner

martin.forejt/nodejs-runner

This Actor allows you to quickly run arbitrary JavaScript code in a real Node.js environment, making it ideal for testing, debugging, or executing small scripts without setting up a local Node.js instance. The actor spawns a separate Node.js process to run the provided code and captures the logs.

Martin Forejt

122

5.0

JavaScript Code to Flowchart

drobnikj/js-code-2-flowchart

Use this to convert JavaScript code to a flowchart. The actor uses https://www.npmjs.com/package/js2flowchart npm package to convert the code to a flowchart. The output is an SVG file.

Jakub Drobník

545

BeautifulSoup Scraper

apify/beautifulsoup-scraper

Crawls websites using raw HTTP requests. It parses the HTML with the BeautifulSoup library and extracts data from the pages using Python code. Supports both recursive crawling and lists of URLs. This Actor is a Python alternative to Cheerio Scraper.

Apify

946

4.2

Fast Scraper

danielherman/fast-scraper

Fast Scraper is a blazingly fast web scraper powered by Rust on the backend. It allows you to scrape static HTML pages extremely quickly while using only <128 MB of memory. With this scraper, you can maximize the efficiency of your credits on Apify.

Daniel Herman

Screenshot Downloader

scrapeai/screenshot-downloader

Screenshot Downloader lets you capture and download high-quality webpage screenshots instantly. Save full or partial page images from any URL with ease — fast, simple, and perfect for developers, designers, and content creators.

ScrapeAI

5.0

bcv-tasa-oficial

grupoaceivzla/bcv-tasa-oficial

Grupo ACEI

Playwright Scraper

apify/playwright-scraper

Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

Apify

4.9

Vanilla JS Scraper

Vanilla JS Scraper

Requests

Pseudo-URLs

Link selector

Page function

Pre-navigation hooks

Post-navigation hooks

Proxy configuration

Debug log

Max concurrency

Max request retries

Page load timeout

Page function timeout

Ignore SSL errors

Additional MIME types

Dataset name

Key-value store name

Custom data

You might also like

Cheerio Scraper

Puppeteer Scraper

Web Scraper

Nodejs Runner

JavaScript Code to Flowchart

BeautifulSoup Scraper

Fast Scraper

Screenshot Downloader

bcv-tasa-oficial

Playwright Scraper

Related articles

Requests

Pseudo-URLs

Link selector

Page function

Pre-navigation hooks

Post-navigation hooks

Proxy configuration

Debug log

Max concurrency

Max request retries

Page load timeout

Page function timeout

Ignore SSL errors

Additional MIME types

Dataset name

Key-value store name

Custom data