Zoopla.co.uk Scraper avatar

Zoopla.co.uk Scraper

Try for free

3 days trial then $30.00/month - No credit card required now

View all Actors
Zoopla.co.uk Scraper

Zoopla.co.uk Scraper

dhrumil/zoopla-scraper
Try for free

3 days trial then $30.00/month - No credit card required now

Scrape Zoopla.co.uk to crawl millions of sale/rent real estate properties from United Kingdom. Our real estate scraper also lets you monitor specific listing for new updates/listing. You can provide multiple search result listings to scrape/monitor.

QU

Commercial URLs not parsing

Closed

quiche opened this issue
8 months ago

Hi Dhrumil - thanks for jumping on this so quickly. Unfortunately you have closed the previous issue so I cannot reply in the same ticket. The following runs have both failed (input is a single commercial URL, for sale in the first and to rent in the second...

https://console.apify.com/actors/teOj85DgYAQSZUKeJ/runs/bkFm3FI891Bi9BndQ https://console.apify.com/actors/teOj85DgYAQSZUKeJ/runs/qtmvz6lPjXxKV6qcL

dhrumil avatar

This was an update specific to commercial properties. I have applied it and it should work now. Please confirm and I will close the issue after that.

QU

quiche

8 months ago

I'm still getting the same error - I assume I don't need to do anything to update the scraper I just start a new run correct?

2024-03-26T21:15:36.400Z INFO Page opened. {"url":"https://www.zoopla.co.uk/to-rent/commercial/property/AB10/?page_size=100"} 2024-03-26T21:15:44.673Z ERROR PuppeteerCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.zoopla.co.uk/to-rent/commercial/property/AB10/?page_size=100","retryCount":1,"id":"wARZrT3jp0i6HRJ"} 2024-03-26T21:15:44.676Z Error: Evaluation failed: TypeError: Cannot read properties of undefined (reading 'split') 2024-03-26T21:15:44.678Z at :2:92 2024-03-26T21:15:44.680Z at ExecutionContext._evaluateInternal (/home/myuser/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:221:19) 2024-03-26T21:15:44.682Z at processTicksAndRejections (node:internal/process/task_queues:96:5) 2024-03-26T21:15:44.684Z at async ExecutionContext.evaluate (/home/myuser/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:110:16) 2024-03-26T21:15:44.686Z at async Object.module.exports [as handleStart] (/home/myuser/src/routes/start.js:9:22) 2024-03-26T21:15:44.689Z at async wrap (/home/myuser/node_modules/@apify/timeout/index.js:73:27)

https://console.apify.com/actors/teOj85DgYAQSZUKeJ/runs/SNQxCB9PkdPu7dBeN#output

dhrumil avatar

Sorry, there was new version pending to be published for this fixed. It's published now. No, you don't need to do anything specific. By default it will use latest version always.

QU

quiche

8 months ago

Ok - that's improved it but still not quite there. No longer getting the split error but it isn't picking up any properties on the list (there should be 51):

2024-03-27T08:25:35.848Z ACTOR: Pulling Docker image of build yoNTqkuuhY6bt7Wwt from repository. 2024-03-27T08:26:10.679Z ACTOR: Creating Docker container. 2024-03-27T08:26:11.218Z ACTOR: Starting Docker container. 2024-03-27T08:26:11.813Z Starting X virtual framebuffer using: Xvfb :99 -ac -screen 0 1920x1080x24+32 -nolisten tcp 2024-03-27T08:26:11.815Z Executing main command 2024-03-27T08:26:13.571Z INFO System info {"apifyVersion":"2.3.1","apifyClientVersion":"2.6.1","osType":"Linux","nodeVersion":"v16.20.2"} 2024-03-27T08:26:14.113Z INFO Starting the crawl. 2024-03-27T08:26:14.165Z INFO PuppeteerCrawler:AutoscaledPool: state {"currentConcurrency":0,"desiredConcurrency":2,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":null},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.6,"actualRatio":null},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":null}}} 2024-03-27T08:26:18.019Z INFO Page opened. {"url":"https://www.zoopla.co.uk/to-rent/commercial/property/AB10/?page_size=100"} 2024-03-27T08:26:18.024Z INFO Total pages: null 2024-03-27T08:26:18.196Z INFO properties on list page : 0 2024-03-27T08:26:18.293Z INFO PuppeteerCrawler: All the requests from request list ... [trimmed]

dhrumil avatar

This was page getting blocked. I have started rotating and reattempting proxy with this scenario now. Please try again.

QU

quiche

8 months ago

Great - 51 pages processed as expected. I'll try a larger batch tomorrow. Many thanks - will close this issue.

Developer
Maintained by Community

Actor Metrics

  • 5 monthly users

  • 3 stars

  • >99% runs succeeded

  • 0.2 hours response time

  • Created in Dec 2022

  • Modified 14 hours ago