Zalando Scraper
No credit card required
This Actor may be unreliable while under maintenance. Would you like to try a similar Actor instead?
See alternative ActorsZalando Scraper
No credit card required
Scrape product data from Zalando, such as images, prices, brands or product attributes. You can extract data from any of the available Zalando domains - zalando.co.uk, zalando.de, zalando.fr, zalando.it and others. Search products by categories or provide URLs of concrete products.
First I tried crawler on "https://www.zalando.co.uk/mens-clothing-t-shirts/" and it scrapped almost 8268 products but total products available on website were 11461 so it missed almost around 3000+ products.
Then I tried to scrap "https://www.zalando.co.uk/womens-clothing-tops/" but the scrapper/crawler is not working at all.
2024-04-02T07:10:49.403Z ACTOR: Pulling Docker image of build OhXlBnLMsSCfpjqUG from repository. 2024-04-02T07:10:51.169Z ACTOR: Creating Docker container. 2024-04-02T07:10:51.510Z ACTOR: Starting Docker container. 2024-04-02T07:10:53.964Z INFO System info {"apifyVersion":"3.1.14","apifyClientVersion":"2.8.4","crawleeVersion":"3.7.1","osType":"Linux","nodeVersion":"v16.20.2"} 2024-04-02T07:10:54.844Z INFO CheerioCrawler: Starting the crawler. 2024-04-02T07:10:57.363Z INFO CheerioCrawler: Opened category page: Women's Tops | Logo T-Shirts | Zalando {"url":"https://www.zalando.co.uk/womens-clothing-tops/"} 2024-04-02T07:10:57.381Z WARN CheerioCrawler: Reclaiming failed request back to the list or queue. Response could not be parsed 2024-04-02T07:10:57.382Z at tryParseReponse (file:///usr/src/app/dist/utils.js:23:15) {"id":"BSSn2gMCCrYT1LQ","url":"https://www.zalando.co.uk/womens-clothing-tops/","retryCount":1} 2024-04-02T07:11:02.269Z INFO CheerioCrawler: Opened category page: Women's Tops | Logo T-Shirts | Zalando {"url":"https://www.zalando.co.uk/womens-clothing-tops/"} 2024-04-02T07:11:02.271Z WARN CheerioCrawler: Reclaiming failed request back to the list or queue. Response could not be parsed 2024-04-02T07:11:02.272Z at tryParseReponse (file:///usr/src/app/dist/utils.js:23:15) {"id":"BSSn2gMCCrYT1LQ","url":"https://www.zalando.co.uk/womens-clothing-tops/","retryCount":2} 2024-04-02T07:11:07.849Z INFO CheerioCrawler: Opened category page: Women's... [trimmed]
Hello, thank you for reporting the issue! The URL "https://www.zalando.co.uk/womens-clothing-tops/" should be working again, along with other URLs.
I also tested your input URL: https://www.zalando.co.uk/mens-clothing-t-shirts/ and the Actor scraped almost all results this time (16,143 out of 16,374 items):
https://console.apify.com/view/runs/2XVcSv5tceCSMNCXl
I'll keep this issue open for now as the number of results still isn't precisely the same as shown on the website, so it needs further investigation.
The actor is again not working correctly. Is it the problem with graphql id which Zalando keeps rotating ?
Hello, thanks for letting me know!
Is it the problem with graphql id which Zalando keeps rotating ?
Yeah, the issue is repeating itself because of regular changes of graphql IDs. The Actor uses hardcoded IDs which is far from ideal, although IDs can be updated very quickly. I've been postponing rewrite of the Actor to the version without hardcoded IDs for quite some time. Recently, the issue has been coming back often so I'll prioritize the rewrite next week to get the Actor into more stable state.
However, there seems to be a bigger update at Zalando's side this time - product data are stored differently in the HTML. So, this issue needs to be addressed first. I'll keep you updated.
Actor Metrics
4 monthly users
-
3 stars
>99% runs succeeded
Created in May 2023
Modified 6 months ago