Playwright Scraper avatar

Playwright Scraper

Try for free

No credit card required

Go to Store
Playwright Scraper

Playwright Scraper

apify/playwright-scraper
Try for free

No credit card required

Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

Do you want to learn more about this Actor?

Get a demo
OC

When trying to scrape a sitemap.xml - getting back a "document.body is null" error

Open

oren_clearya opened this issue
2 months ago

When running the Playwright Actor with a startUrl which is a sitemap XML - getting back the following error:

12024-10-19T14:54:41.339Z DEBUG PlaywrightCrawler:AutoscaledPool: scaling up {"oldConcurrency":2,"newConcurrency":3,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.6,"actualRatio":0},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}}
22024-10-19T14:54:49.152Z WARN  PlaywrightCrawler: Reclaiming failed request back to the list or queue. page.evaluate: document.body is null
32024-10-19T14:54:49.153Z @debugger eval code line 226 > eval:1:7
42024-10-19T14:54:49.154Z evaluate@debugger eval code:228:17
52024-10-19T14:54:49.155Z @debugger eval code:1:44
62024-10-19T14:54:49.155Z
72024-10-19T14:54:49.156Z     at CrawlerSetup._requestHandler (/home/myuser/dist/internals/crawler_setup.js:379:35) {"id":"vwv0onJJ2YlCPdo","url":"https://apify.com/sitemap.xml","retryCount":1}

It seems to fail before reaching the page function itself. However, here is the pageFunction that was used:

1async function pageFunction(context) {
2  const { page, request, log } = context;
3
4  async function pageEvaluate(context) {
5    return {
6      url: document.URL,
7      html: document.body?.innerHTML ?? document.querySelector('urlset')?.innerHTML,
8    };
9  }
10
11  let data = await page.evaluate(pageEva... [trimmed]
Developer
Maintained by Apify

Actor Metrics

  • 67 monthly users

  • 18 stars

  • >99% runs succeeded

  • 54 days response time

  • Created in Aug 2022

  • Modified 6 months ago

Categories