Puppeteer Scraper is the most powerful scraper tool in our arsenal (aside from developing your own actors). It uses the Puppeteer library to programmatically control a headless Chrome browser and it can make it do almost anything. If using the Web Scraper does not cut it, Puppeteer Scraper is what you need.

Puppeteer is a Node.js library, so knowledge of Node.js and its paradigms is expected when working with the Puppeteer Scraper.

If you need either a faster, or a simpler tool, see the Cheerio Scraper for speed, or Web Scraper for simplicity.

Input

Input is provided via the pre-configured UI. For more info on the available options, see the tooltips in the UI or Input Schema.

Page function

Page function is a single JavaScript function that enables the user to control the Scraper's operation, manipulate the visited pages and extract data as needed. It is invoked with a context object containing the following properties:

const context = {
    // USEFUL DATA
    input, // Unaltered original input as parsed from the UI
    env, // Contains information about the run such as actorId or runId
    customData, // Value of the 'Custom data' scraper option.
    
    // EXPOSED OBJECTS
    page, // Reference to the Puppeteer.Page.
    request, // Apify.Request object.
    response, // Response object holding the status code and headers.
    puppeteerPool, // Reference to the Apify.PuppeteerPool instance running the browsers.
    autoscaledPool, // Reference to the Apify.AutoscaledPool instance managing concurrency.
    globalStore, // Represents an in memory store that can be used to share data across pageFunction invocations.
    log, // Reference to Apify.utils.log
    Apify, // Reference to the full power of Apify SDK.
    
    // EXPOSED FUNCTIONS
    setValue, // Reference to the Apify.setValue() function.
    getValue, // Reference to the Apify.getValue() function.
    saveSnapshot, // Saves a screenshot and full HTML of the current page to the key value store.
    skipLinks, // Prevents enqueueing more links via Pseudo URLs on the current page.
    enqueueRequest, // Adds a page to the request queue.
    
}

`context`

The following tables describe the context object in more detail.

Data structures

Argument	Type
`input`	`Object`
Input as it was received from the UI. Each `pageFunction` invocation gets a fresh copy and you can not modify the input by changing the values in this object.
`env`	`Object`
A map of all the relevant environment variables that you may want to use. See the `Apify.getEnv()` function for a preview of the structure and full documentation.
`customData`	`Object`
Since the input UI is fixed, it does not support adding of other fields that may be needed for all specific use cases. If you need to pass arbitrary data to the scraper, use the Custom data input field and its contents will be available under the `customData` context key.

Functions

The context object provides several helper functions that make scraping and saving data easier and more streamlined. All of the functions are async so make sure to use await with their invocations.

Argument	Arguments
`setValue`	`(key: string, data: Object, options: Object)`
To save data to the default key-value store, you can use the `setValue` function. See the full documentation: `Apify.setValue()` function.
`getValue`	`(key: string)`
To read data from the default key-value store, you can use the `getValue` function. See the full documentation: `Apify.getValue()` function.
`saveSnapshot`
A helper function that enables saving a snapshot of the current page's HTML and its screenshot into the default key value store. Each snapshot overwrites the previous one and the function's invocations will also be throttled if invoked more than once in 2 seconds, to prevent abuse. So make sure you don't call it for every single request. You can find the screenshot under the SNAPSHOT-SCREENSHOT key and the HTML under the SNAPSHOT-BODY key.
`skipLinks`
With each invocation of the `pageFunction` the scraper attempts to extract new URLs from the page using the Link selector and PseudoURLs provided in the input UI. If you want to prevent this behavior in certain cases, call the `skipLinks` function and no URLs will be added to the queue for the given page.
`enqueueRequest`	`(request: Request\|Object, options: Object)`
To enqueue a specific URL manually instead of automatically by a combination of a Link selector and a Pseudo URL, use the `enqueueRequest` function. It accepts a plain object as argument that needs to have the structure to construct a `Request` object. But frankly, you just need a URL: `{ url: 'https://www.example.com }`

Class instances and namespaces

The following are either class instances or namespaces, which is just a way of saying objects with functions on them.

Page

Reference to the Puppeteer Page object, which enables you to use the full power of Puppeteer in your Page functions.

Request

Apify uses a request object to represent metadata about the currently crawled page, such as its URL or the number of retries. See the Request class for a preview of the structure and full documentation.

Response

The response object is produced by Puppeteer. Currently, we only pass the HTTP status code and the response headers to the context.

PuppeteerPool

A reference to the running instance of the PuppeteerPool class. See Apify SDK docs for more information.

AutoscaledPool

A reference to the running instance of the AutoscaledPool class. See Apify SDK docs for more information.

Global Store

globalStore represents an instance of a very simple in memory store that is not scoped to the individual pageFunction invocation. This enables you to easily share global data such as API responses, tokens and other. Since the stored data need to cross from the Browser to the Node.js process, it cannot be any kind of data, but only JSON stringifiable objects. You cannot store DOM objects, functions, circular objects and so on.

globalStore in Puppeteer Scraper is just a Map.

Log

log is a reference to Apify.utils.log. You can use any of the logging methods such as log.info or log.exception. log.debug is special, because you can trigger visibility of those messages in the scraper's Log by the provided Debug log input option.

Apify

A reference to the full power of the Apify SDK. See the docs for more information and all the available functions and classes.

Caution: Since we're making the full SDK available, and Puppeteer Scraper runs using the SDK, some edge case manipulations may lead to inconsistencies. Use Apify with caution and avoid making global changes unless you know what you're doing.

Output

Output is a dataset containing extracted data for each scraped page. To save data into the dataset, return an Object or an Object[] from the pageFunction.

Dataset

For each of the scraped URLs, the dataset contains an object with results and some metadata. If you were scraping the HTML <title> of Apify and returning the following object from the pageFunction

return {
  title: "Web Scraping, Data Extraction and Automation - Apify"
}

it would look like this:

{
  "title": "Web Scraping, Data Extraction and Automation - Apify",
  "#error": false,
  "#debug": {
    "requestId": "fvwscO2UJLdr10B",
    "url": "https://apify.com",
    "loadedUrl": "https://apify.com/",
    "method": "GET",
    "retryCount": 0,
    "errorMessages": null,
    "statusCode": 200
  }
}

You can remove the metadata (and results containing only metadata) from the results by selecting the Clean items option when downloading the dataset.

The result will look like this:

{
  "title": "Web Scraping, Data Extraction and Automation - Apify"
}

On this page

Apify Puppeteer Scraper

Share Actor:

Justdial Business Search Scraper

codingfrontend/justdial-business-search-scraper

Scrape business listings from JustDial.com based on search URLs.

codingfrontend

Bulletproof : Advanced Linkedin Profile Scraper

bronze_shovel/linkedin-profile-scraper-pro

Biffer

106

Puppeteer Scraper

apify/puppeteer-scraper

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

Apify

9.9K

5.0

(6)

bcv-tasa-oficial

grupoaceivzla/bcv-tasa-oficial

Grupo ACEI

Web Scraper

apify/web-scraper

Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

Apify

99K

4.4

(32)

Playwright Scraper

apify/playwright-scraper

Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

Apify

3.1K

4.7

(8)

Camoufox Scraper

apify/camoufox-scraper

Crawls websites with stealthy Camoufox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

Apify

JSDOM Scraper

apify/jsdom-scraper

Parses the HTML using the JSDOM library, providing the same DOM API as browsers do (e.g. `window`). It is able to process client-side JavaScript without using a real browser. Performance-wise, it stands somewhere between the Cheerio Scraper and the browser scrapers.

Apify

115

4.3

(3)

VK People Scraper

easyapi/vk-people-scraper

Scrape VK.com user profiles based on search keywords. Extract detailed user information including usernames, profile URLs, locations, and avatar images. Perfect for lead generation, market research, and social media analysis.

EasyApi

Cheerio Scraper

apify/cheerio-scraper

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

Apify

10K

4.9

(14)

Loom Transcript Scraper

scraper-mind/loom-transcript-scraper

Effortlessly extract transcripts from public Loom videos with Loom Transcript Scraper. Get structured JSON output with timestamps, metadata, and proxy support—ideal for creators, analysts, and researchers. Perfect for analysis or repurposing content.