Zalando Scraper avatar
Zalando Scraper
Under maintenance
Try for free

No credit card required

View all Actors
This Actor is under maintenance.

This Actor may be unreliable while under maintenance. Would you like to try a similar Actor instead?

See alternative Actors
Zalando Scraper

Zalando Scraper

lhotanova/zalando-scraper
Try for free

No credit card required

Scrape product data from Zalando, such as images, prices, brands or product attributes. You can extract data from any of the available Zalando domains - zalando.co.uk, zalando.de, zalando.fr, zalando.it and others. Search products by categories or provide URLs of concrete products.

UM

Crawler Not Working

Open

umermansoor opened this issue
4 months ago

First I tried crawler on "https://www.zalando.co.uk/mens-clothing-t-shirts/" and it scrapped almost 8268 products but total products available on website were 11461 so it missed almost around 3000+ products.

Then I tried to scrap "https://www.zalando.co.uk/womens-clothing-tops/" but the scrapper/crawler is not working at all.

UM

umermansoor

4 months ago

2024-04-02T07:10:49.403Z ACTOR: Pulling Docker image of build OhXlBnLMsSCfpjqUG from repository. 2024-04-02T07:10:51.169Z ACTOR: Creating Docker container. 2024-04-02T07:10:51.510Z ACTOR: Starting Docker container. 2024-04-02T07:10:53.964Z INFO System info {"apifyVersion":"3.1.14","apifyClientVersion":"2.8.4","crawleeVersion":"3.7.1","osType":"Linux","nodeVersion":"v16.20.2"} 2024-04-02T07:10:54.844Z INFO CheerioCrawler: Starting the crawler. 2024-04-02T07:10:57.363Z INFO CheerioCrawler: Opened category page: Women's Tops | Logo T-Shirts | Zalando {"url":"https://www.zalando.co.uk/womens-clothing-tops/"} 2024-04-02T07:10:57.381Z WARN CheerioCrawler: Reclaiming failed request back to the list or queue. Response could not be parsed 2024-04-02T07:10:57.382Z at tryParseReponse (file:///usr/src/app/dist/utils.js:23:15) {"id":"BSSn2gMCCrYT1LQ","url":"https://www.zalando.co.uk/womens-clothing-tops/","retryCount":1} 2024-04-02T07:11:02.269Z INFO CheerioCrawler: Opened category page: Women's Tops | Logo T-Shirts | Zalando {"url":"https://www.zalando.co.uk/womens-clothing-tops/"} 2024-04-02T07:11:02.271Z WARN CheerioCrawler: Reclaiming failed request back to the list or queue. Response could not be parsed 2024-04-02T07:11:02.272Z at tryParseReponse (file:///usr/src/app/dist/utils.js:23:15) {"id":"BSSn2gMCCrYT1LQ","url":"https://www.zalando.co.uk/womens-clothing-tops/","retryCount":2} 2024-04-02T07:11:07.849Z INFO CheerioCrawler: Opened category page: Women's Tops | Logo T-Shirts | Zalando {"url":"https://www.zalando.co.uk/womens-clothing-tops/"} 2024-04-02T07:11:07.858Z WARN CheerioCrawler: Reclaiming failed request back to the list or queue. Response could not be parsed 2024-04-02T07:11:07.859Z at tryParseReponse (file:///usr/src/app/dist/utils.js:23:15) {"id":"BSSn2gMCCrYT1LQ","url":"https://www.zalando.co.uk/womens-clothing-tops/","retryCount":3} 2024-04-02T07:11:12.893Z INFO CheerioCrawler: Opened category page: Women's Tops | Logo T-Shirts | Zalando {"url":"https://www.zalando.co.uk/womens-clothing-tops/"} 2024-04-02T07:11:12.981Z ERROR CheerioCrawler: Request failed and reached maximum retries. Error: Response could not be parsed 2024-04-02T07:11:12.982Z at tryParseReponse (file:///usr/src/app/dist/utils.js:23:15) 2024-04-02T07:11:12.983Z at parseGraphqlProductUrls (file:///usr/src/app/dist/utils.js:44:30) 2024-04-02T07:11:12.984Z at categoryRoute (file:///usr/src/app/dist/routes/categoryRoute.js:7:32) 2024-04-02T07:11:12.985Z at CheerioCrawler.func [as requestHandler] (/usr/src/app/node_modules/@crawlee/core/router.js:172:44) 2024-04-02T07:11:12.986Z at /usr/src/app/node_modules/@crawlee/http/internals/http-crawler.js:347:87 2024-04-02T07:11:12.986Z at wrap (/usr/src/app/node_modules/@apify/timeout/index.js:52:27) 2024-04-02T07:11:12.988Z at /usr/src/app/node_modules/@apify/timeout/index.js:66:7 2024-04-02T07:11:12.989Z at AsyncLocalStorage.run (node:async_hooks:319:14) 2024-04-02T07:11:12.990Z at /usr/src/app/node_modules/@apify/timeout/index.js:65:13 2024-04-02T07:11:12.991Z at new Promise (

lhotanok avatar

Hello, thank you for reporting the issue! The URL "https://www.zalando.co.uk/womens-clothing-tops/" should be working again, along with other URLs.

I also tested your input URL: https://www.zalando.co.uk/mens-clothing-t-shirts/ and the Actor scraped almost all results this time (16,143 out of 16,374 items):

https://console.apify.com/view/runs/2XVcSv5tceCSMNCXl

I'll keep this issue open for now as the number of results still isn't precisely the same as shown on the website, so it needs further investigation.

TM

tuhin_mallick

2 months ago

The actor is again not working correctly. Is it the problem with graphql id which Zalando keeps rotating ?

lhotanova avatar

Hello, thanks for letting me know!

Is it the problem with graphql id which Zalando keeps rotating ?

Yeah, the issue is repeating itself because of regular changes of graphql IDs. The Actor uses hardcoded IDs which is far from ideal, although IDs can be updated very quickly. I've been postponing rewrite of the Actor to the version without hardcoded IDs for quite some time. Recently, the issue has been coming back often so I'll prioritize the rewrite next week to get the Actor into more stable state.

However, there seems to be a bigger update at Zalando's side this time - product data are stored differently in the HTML. So, this issue needs to be addressed first. I'll keep you updated.

Developer
Maintained by Community
Actor metrics
  • 7 monthly users
  • 3 stars
  • 100.0% runs succeeded
  • Created in May 2023
  • Modified 2 months ago
Categories