
Playwright Scraper
Pricing
Pay per usage

Playwright Scraper
Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.
4.3 (7)
Pricing
Pay per usage
31
Monthly users
142
Runs succeeded
98%
Response time
3.1 days
Last modified
10 months ago
When trying to scrape a sitemap.xml - getting back a "document.body is null" error
When running the Playwright Actor with a startUrl which is a sitemap XML - getting back the following error:
12024-10-19T14:54:41.339Z DEBUG PlaywrightCrawler:AutoscaledPool: scaling up {"oldConcurrency":2,"newConcurrency":3,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.6,"actualRatio":0},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}} 22024-10-19T14:54:49.152Z WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. page.evaluate: document.body is null 32024-10-19T14:54:49.153Z @debugger eval code line 226 > eval:1:7 42024-10-19T14:54:49.154Z evaluate@debugger eval code:228:17 52024-10-19T14:54:49.155Z @debugger eval code:1:44 62024-10-19T14:54:49.155Z 72024-10-19T14:54:49.156Z at CrawlerSetup._requestHandler (/home/myuser/dist/internals/crawler_setup.js:379:35) {"id":"vwv0onJJ2YlCPdo","url":"https://apify.com/sitemap.xml","retryCount":1}
It seems to fail before reaching the page function itself. However, here is the pageFunction that was used:
1async function pageFunction(context) { 2 const { page, request, log } = context; 3 4 async function pageEvaluate(context) { 5 return { 6 url: document.URL, 7 html: document.body?.innerHTML ?? document.querySelector('urlset')?.innerHTML, 8 }; 9 } 10 11 let data = await page.evaluate(pageEva... [trimmed]
Hello @oren_clearya,
Thank you for bringing this issue to our attention, and I apologize for the delayed response. I attempted to replicate this using a similar setup, and scraping the XML document worked as expected on my end. Unfortunately, the linked run has expired, so I cannot investigate further or reproduce the issue.
It’s possible that the error was caused by a temporary issue or specific conditions during the run. If you encounter this problem again, please provide a fresh run link and any additional context, and we’ll be happy to assist further.
I’ll close this issue for now, but don’t hesitate to create a new one if needed. Thank you for your understanding!
Pricing
Pricing model
Pay per usageThis Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.