
Playwright Scraper
Pricing
Pay per usage

Playwright Scraper
Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.
4.3 (7)
Pricing
Pay per usage
36
Total users
1.4k
Monthly users
233
Runs succeeded
99%
Issue response
8.9 days
Last modified
24 days ago
When trying to scrape a sitemap.xml - getting back a "document.body is null" error
Closed
When running the Playwright Actor with a startUrl which is a sitemap XML - getting back the following error:
2024-10-19T14:54:41.339Z DEBUG PlaywrightCrawler:AutoscaledPool: scaling up {"oldConcurrency":2,"newConcurrency":3,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.6,"actualRatio":0},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}}2024-10-19T14:54:49.152Z WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. page.evaluate: document.body is null2024-10-19T14:54:49.153Z @debugger eval code line 226 > eval:1:72024-10-19T14:54:49.154Z evaluate@debugger eval code:228:172024-10-19T14:54:49.155Z @debugger eval code:1:442024-10-19T14:54:49.155Z2024-10-19T14:54:49.156Z at CrawlerSetup._requestHandler (/home/myuser/dist/internals/crawler_setup.js:379:35) {"id":"vwv0onJJ2YlCPdo","url":"https://apify.com/sitemap.xml","retryCount":1}
It seems to fail before reaching the page function itself. However, here is the pageFunction that was used:
async function pageFunction(context) {const { page, request, log } = context;async function pageEvaluate(context) {return {url: document.URL,html: document.body?.innerHTML ?? document.querySelector('urlset')?.innerHTML,};}let data = await page.evaluate(pageEva... [trimmed]
Hello @oren_clearya,
Thank you for bringing this issue to our attention, and I apologize for the delayed response. I attempted to replicate this using a similar setup, and scraping the XML document worked as expected on my end. Unfortunately, the linked run has expired, so I cannot investigate further or reproduce the issue.
It’s possible that the error was caused by a temporary issue or specific conditions during the run. If you encounter this problem again, please provide a fresh run link and any additional context, and we’ll be happy to assist further.
I’ll close this issue for now, but don’t hesitate to create a new one if needed. Thank you for your understanding!