Macy's Scraper
Pay $9.00 for 1,000 results
Macy's Scraper
Pay $9.00 for 1,000 results
Macy's web scraper to crawl product information including price and sale price, color, and images. Extract all data in a dataset in structured formats.
Hi Gustavo, https://www.macys.com/shop/womens-clothing/all-womens-clothing/Pageindex/246?id=188851
I indicated the link above for scraping page. All women's page 246. However, the log indicates as follows: https://www.macys.com/shop/mens-clothing/all-mens-clothing/Pageindex/1217?id=197651 https://www.macys.com/shop/mens-clothing/all-mens-clothing/Pageindex/1216?id=197651 https://www.macys.com/shop/mens-clothing/all-mens-clothing/Pageindex/1215?id=197651 https://www.macys.com/shop/mens-clothing/all-mens-clothing/Pageindex/1214?id=197651 https://www.macys.com/shop/womens-clothing/all-womens-clothing/Pageindex/247?id=188851
For some reason it went over to men's and to page 1217, 1216, 1215, 1214. Why is that?
Also Run ID f2lwGO06ELesgyJrS and deuQAydeWf0rgKT0J are having the same issue. Some pages are scraped from mens.
Run ID: sLTYaK3ytuzSbQa6n I set the item https://www.macys.com/shop/mens-clothing/all-mens-clothing/Pageindex/881?id=197651 Page 881. However, it's the from link that it started scraping was from https://www.macys.com/shop/mens-clothing/all-mens-clothing/Pageindex/1267?id=197651 I haven't check everything yet. But so far, every run I checked, it's got a weird pattern.
Those are failed urls from previous runs.
"those are failed URLs from previous runs" Question: 1. If I set https://www.macys.com/shop/womens-clothing/all-womens-clothing/Pageindex/246?id=188851 (women's page 246), why is it attempting to scrape from men's page 1214?
Run ID: v9PI0NTpbxve7G6Nu
I ran this task from page 600 and it only obtain 16 results. Question: 2. do I get charged for running a task when there are failed urls due to collected previously? 3. I recall you mentioning that sometime the Run stops as Succeeded maybe because Macy's page may be blocking it from proceeding. Is there a way to fix or bypass that?
If you run once for all-mens-clothing
, all failed url for this run will be stored to be retryed in the next run. Then the second run you run for all-womens-clothing
but the previous failed URLs from all-mens-clothing
are also added to the queue. I will take a closer look at this run, seems that some products are returning error and are not being scraped. I didn't understood your second question, Apify charges for the use of resources spent running the actor. Is not possible to bypass 100% all the anti-scraping set by any website, you need to retry the request with different sessions (which Apify does automatically) to eventually bypass it.
There was a product page with a collection that wasn't being scrapped since the layout was composed of multiple products. I have added this new layout to the actor and the product from this URLs will be also scrapped now.
Actor Metrics
2 monthly users
-
4 stars
88% runs succeeded
Created in Dec 2019
Modified 4 days ago