Puppeteer Scraper avatar
Puppeteer Scraper

Pricing

Pay per usage

Go to Store
Puppeteer Scraper

Puppeteer Scraper

Developed by

Apify

Apify

Maintained by Apify

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

5.0 (5)

Pricing

Pay per usage

191

Total users

8.5K

Monthly users

988

Runs succeeded

>99%

Issues response

42 days

Last modified

2 months ago

SD

Puppeteer Scraper Actor Not Executing Requests (Despite Valid Start URL & Page Function)

Closed

sdejoke opened this issue
a month ago

Hi team, I’m encountering a persistent issue with the Apify Puppeteer Scraper actor where no requests are being successfully executed, despite a valid startUrls input and a working pageFunction.

Here’s a summary of what I’ve tried: • I’m entering my URL manually under startUrls, e.g.: https://www.glassdoor.com/Reviews/Google-Reviews-E9079.htm • The pageFunction is simple and valid (e.g., page.title()). • I’ve also attempted to configure headers and user-agent via preNavigationHooks. • Despite this, I either get 0 requests processed, or in some cases, 403 Forbidden errors (even when trying with Apify Proxy). • I already have login logic and key-value store in place but I’m unsure if this is being properly connected during execution. • There’s no indication that the crawler is navigating or queuing additional URLs, even though the actor says it’s configured correctly.

Here’s the log excerpt from my most recent run:

requestsFinished: 0 requestsFailed: 1 403 status code

I’m not sure if it’s: • A problem with the actor’s config parsing (e.g., from JSON input) • A bug with pre-navigation hooks • Or if Glassdoor is aggressively blocking even with proxies and login steps.

Could someone on the team help debug this setup, or confirm what I might be missing?

jindrich.bar avatar

Hello, and thank you for the detailed report!

A 403 status code generally means the target website is actively blocking the request — even if you're using proxies or have login logic in place. Glassdoor is known to have strong anti-bot protection. The Puppeteer Scraper Actor might simply not be strong enough in its stealth and anti-bot capabilities to reliably access Glassdoor. In cases like this, a third-party solution may work better.

We recommend checking out some ready-made solutions in the Apify Store that are already designed for Glassdoor and similar sites (link to store).

These may include tested workarounds like proper headers, session handling, or stealth browser behavior.

Since this is not an issue with the Actor itself, we’ll go ahead and close this ticket But feel free to open a new one if you need help with another setup!