Playwright Scraper avatar
Playwright Scraper

Pricing

Pay per usage

Go to Store
Playwright Scraper

Playwright Scraper

Developed by

Apify

Apify

Maintained by Apify

Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

4.3 (7)

Pricing

Pay per usage

54

Total users

2K

Monthly users

324

Runs succeeded

97%

Issues response

7.4 days

Last modified

2 months ago

TR

Enqueued links not processed

Closed

trivo opened this issue
2 months ago

We've had some cases of different websites where only the homepage (start url) is scraped even though links get enqueued but they aren't followed or processed. The actor stops when done with the homepage.

There're no errors or warnings within the logs.

Here are some run IDs where it happened:

  • DlbeLbxFkz3lpwGi4
  • wIOBtIvVin5ntFkG8
  • rRp1RWW1A7vQ8vhWc
jindrich.bar avatar

Hello, and thank you for your interest in this Actor!

A large part of what you implemented in your Page function is actually already in Playwright Scraper (or Crawlee).

The following snippet is actually identical to your implementation with transformRequestFunction:

await enqueueLinks({
selector: "a",
strategy: 'same-domain',
exclude: [
/\.(docx?|pdf|webp|jpe?g|gif|png|php|asp)$/i,
/blog|archive|arhiv/i
],
});

If you want to stay with your implementation, you absolutely can - the issue is that by default, Crawlee uses strategy: 'same-hostname' (source here), which matches 0 links on the first page, so the Actor finishes early. You can pass strategy: 'all' to enqueueLinks so that Crawlee doesn't filter the links prematurely and passes all the links to your transform function:

await enqueueLinks({
selector: "a",
strategy: 'all',
transformRequestFunction: (req) => {
// your transformRequestFunction

I'll close this issue now, but feel free to ask additional questions if you have any. Cheers!

TR

trivo

18 days ago

Hi Jindřich Bär, Just wanted to say thank you so much for the wonderful explanation and help. Have a great day and thank you again!