Playwright Scraper avatar
Playwright Scraper

Pricing

Pay per usage

Go to Store
Playwright Scraper

Playwright Scraper

apify/playwright-scraper

Developed by

Apify

Maintained by Apify

Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

4.3 (7)

Pricing

Pay per usage

31

Monthly users

142

Runs succeeded

98%

Response time

3.1 days

Last modified

10 months ago

OC

When trying to scrape a sitemap.xml - getting back a "document.body is null" error

Closed
oren_clearya opened this issue
5 months ago

When running the Playwright Actor with a startUrl which is a sitemap XML - getting back the following error:

12024-10-19T14:54:41.339Z DEBUG PlaywrightCrawler:AutoscaledPool: scaling up {"oldConcurrency":2,"newConcurrency":3,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.6,"actualRatio":0},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}}
22024-10-19T14:54:49.152Z WARN  PlaywrightCrawler: Reclaiming failed request back to the list or queue. page.evaluate: document.body is null
32024-10-19T14:54:49.153Z @debugger eval code line 226 > eval:1:7
42024-10-19T14:54:49.154Z evaluate@debugger eval code:228:17
52024-10-19T14:54:49.155Z @debugger eval code:1:44
62024-10-19T14:54:49.155Z
72024-10-19T14:54:49.156Z     at CrawlerSetup._requestHandler (/home/myuser/dist/internals/crawler_setup.js:379:35) {"id":"vwv0onJJ2YlCPdo","url":"https://apify.com/sitemap.xml","retryCount":1}

It seems to fail before reaching the page function itself. However, here is the pageFunction that was used:

1async function pageFunction(context) {
2  const { page, request, log } = context;
3
4  async function pageEvaluate(context) {
5    return {
6      url: document.URL,
7      html: document.body?.innerHTML ?? document.querySelector('urlset')?.innerHTML,
8    };
9  }
10
11  let data = await page.evaluate(pageEva... [trimmed]
jindrich.bar avatar

Hello @oren_clearya,

Thank you for bringing this issue to our attention, and I apologize for the delayed response. I attempted to replicate this using a similar setup, and scraping the XML document worked as expected on my end. Unfortunately, the linked run has expired, so I cannot investigate further or reproduce the issue.

It’s possible that the error was caused by a temporary issue or specific conditions during the run. If you encounter this problem again, please provide a fresh run link and any additional context, and we’ll be happy to assist further.

I’ll close this issue for now, but don’t hesitate to create a new one if needed. Thank you for your understanding!

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.