
Playwright Scraper
Pricing
Pay per usage

Playwright Scraper
Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.
4.3 (7)
Pricing
Pay per usage
45
Total users
1.7K
Monthly users
303
Runs succeeded
97%
Issues response
33 days
Last modified
a month ago
Enqueued links not processed
Closed
We've had some cases of different websites where only the homepage (start url) is scraped even though links get enqueued but they aren't followed or processed. The actor stops when done with the homepage.
There're no errors or warnings within the logs.
Here are some run IDs where it happened:
- DlbeLbxFkz3lpwGi4
- wIOBtIvVin5ntFkG8
- rRp1RWW1A7vQ8vhWc
Hello, and thank you for your interest in this Actor!
A large part of what you implemented in your Page function is actually already in Playwright Scraper (or Crawlee).
The following snippet is actually identical to your implementation with transformRequestFunction
:
await enqueueLinks({selector: "a",strategy: 'same-domain',exclude: [/\.(docx?|pdf|webp|jpe?g|gif|png|php|asp)$/i,/blog|archive|arhiv/i],});
If you want to stay with your implementation, you absolutely can - the issue is that by default, Crawlee uses strategy: 'same-hostname'
(source here), which matches 0 links on the first page, so the Actor finishes early. You can pass strategy: 'all'
to enqueueLinks
so that Crawlee doesn't filter the links prematurely and passes all the links to your transform function:
await enqueueLinks({selector: "a",strategy: 'all',transformRequestFunction: (req) => {// your transformRequestFunction
I'll close this issue now, but feel free to ask additional questions if you have any. Cheers!