Pricing

Pay per usage

Try for free

Go to Store

Puppeteer Scraper

Try for free

Developed by

Apify

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

5.0 (5)

Pricing

Pay per usage

191

Total users

8.5K

Monthly users

988

Runs succeeded

>99%

Issues response

42 days

Last modified

2 months ago

Developer tools

Open source

Back to issues Create new issue

Puppeteer Scraper Actor Not Executing Requests (Despite Valid Start URL & Page Function)

Closed

sdejoke opened this issue

Hi team, I’m encountering a persistent issue with the Apify Puppeteer Scraper actor where no requests are being successfully executed, despite a valid startUrls input and a working pageFunction.

Here’s a summary of what I’ve tried: • I’m entering my URL manually under startUrls, e.g.: https://www.glassdoor.com/Reviews/Google-Reviews-E9079.htm • The pageFunction is simple and valid (e.g., page.title()). • I’ve also attempted to configure headers and user-agent via preNavigationHooks. • Despite this, I either get 0 requests processed, or in some cases, 403 Forbidden errors (even when trying with Apify Proxy). • I already have login logic and key-value store in place but I’m unsure if this is being properly connected during execution. • There’s no indication that the crawler is navigating or queuing additional URLs, even though the actor says it’s configured correctly.

Here’s the log excerpt from my most recent run:

requestsFinished: 0 requestsFailed: 1 403 status code

I’m not sure if it’s: • A problem with the actor’s config parsing (e.g., from JSON input) • A bug with pre-navigation hooks • Or if Glassdoor is aggressively blocking even with proxies and login steps.

Could someone on the team help debug this setup, or confirm what I might be missing?

Jindřich Bär (jindrich.bar)

Hello, and thank you for the detailed report!

A 403 status code generally means the target website is actively blocking the request — even if you're using proxies or have login logic in place. Glassdoor is known to have strong anti-bot protection. The Puppeteer Scraper Actor might simply not be strong enough in its stealth and anti-bot capabilities to reliably access Glassdoor. In cases like this, a third-party solution may work better.

We recommend checking out some ready-made solutions in the Apify Store that are already designed for Glassdoor and similar sites (link to store).

These may include tested workarounds like proper headers, session handling, or stealth browser behavior.

Since this is not an issue with the Actor itself, we’ll go ahead and close this ticket But feel free to open a new one if you need help with another setup!

Add comment

Playwright Scraper

apify/playwright-scraper

Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

Apify

3.6

Cheerio Scraper

apify/cheerio-scraper

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

Apify

4.7

Web Scraper

apify/web-scraper

Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

Apify

90K

4.4

Example Puppeteer

apify/example-puppeteer

Example showing how to use headless Chromium with Puppeteer to open a web page, determine its dimensions, save a screenshot, and print the page to PDF. This actor must use images with Puppeteer (Node.js 8 + Puppeteer on Debian).

Apify

410

4.6

Camoufox Scraper

apify/camoufox-scraper

Crawls websites with stealthy Camoufox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

Apify

Vanilla JS Scraper

mstephen190/vanilla-js-scraper

Scrape the web using familiar JavaScript methods! Crawls websites using raw HTTP requests, parses the HTML with the JSDOM package, and extracts data from the pages using Node.js code. Supports both recursive crawling and lists of URLs. This actor is a non jQuery alternative to CheerioScraper.

Matthias Stephens

471

Example Code Runner (Puppeteer)

apify/example-code-runner-puppeteer

Generic Actor to run code examples from the documentation via "Run on Apify" links.

Apify

565

4.3

bcv-tasa-oficial

grupoaceivzla/bcv-tasa-oficial

Grupo ACEI

Website Checker Runner Puppeteer

lukaskrivka/website-checker-puppeteer

Checks the provided website using Puppeteer. This is a low level runner, most likely you want to use the high level master actor - https://apify.com/lukaskrivka/website-checker

Lukáš Křivka

199

HTML Scraper pro

scrapingxpert/html-scraper-pro

The HTML Scraper Pro is a powerful tool designed to extract the HTML source code and metadata from websites. It uses advanced web scraping techniques to retrieve the full HTML content of web pages,page title and HTTP status code.This tool is ideal for data extraction, website analysis, and archiving