Nordstrom Scraper
Pay $5.00 for 1,000 products
Nordstrom Scraper
Pay $5.00 for 1,000 products
Nordstrom web scraper to crawl product information including price and sale price, color, and images. Extract all data in a dataset in multiple formats.
I am using the Nordstrom Scraper and I am getting a response "Request failed with status code 400" which seems to be APIFY related.
This issue is very common and we were forced to add a stricter retry mechanism in order to avoid failing but we want to make sure there is no issue on our end. The request is populated with this body:
"data": "{\"startUrls\":[\"https://shop.nordstrom.com/s/olaplex-no-5-bond-maintenance-conditioner/5056281\",\"https://shop.nordstrom.com/s/shiseido-urban-environment-sun-dual-care-oil-free-mineral-broad-spectrum-sunscreen-spf-42/6739785\",\"https://shop.nordstrom.com/s/silkn-infinity-hair-removal-device/5376009\",\"https://shop.nordstrom.com/s/vapour-soft-focus-foundation/5850728\",\"https://shop.nordstrom.com/s/triple-peptide-cactus-oasis-serum/7166106\",\"https://shop.nordstrom.com/s/biggie-chain-necklace/5787614\",\"https://shop.nordstrom.com/s/gucci-the-alchemists-garden-the-last-day-of-summer-eau-de-parfum/5519933\",\"https://shop.nordstrom.com/s/charlotte-tilbury-airbrush-flawless-foundation/5368721?\",\"https://shop.nordstrom.com/s/acqua-di-parma-blu-mediterraneo-bergamotto-di-calabria-eau-de-toilette-spray/3269001\",\"https://shop.nordstrom.com/s/polished-pebble-leather-crossbody-bag-nordstrom-exclusive/7411875?\",\"ttps://www.nordstrom.com/s/mz-wallace-metro-crossbody-bag/5447687\",\"https://shop.nordstrom.com/s/gh-bass-larson-leather-penny-loafer-men/6541600\",\"https://shop.nordstrom.com/s/mini-weekender-bag/7574158... [trimmed]
Seems like you are missing the country
property used to get shipping information. Since it is a separate field, it is not related to the proxy settings and needs to be provided separately. I think in your case you can just use country: "United States"
.
Gustavo thank you for your answer. The version I am using is 1.2.24 which as far as I can see has the country property as optional.
Also, I would like to know if there are any change logs for older versions since I am planning to upgrade to 2.X version of this actor. I could only find change logs to 2.X here
Thats is the chagelog, I just upgrated the Apify SDK and made some improvements. I beleive you should be able tonuse it with no problems. I recomend you to always use the latest version since that is the version I am always maintaining.
With the new version the availability property is always populated as null in contrast with older version where it is in the ApiProduct. Is there an issue with the stock status?
Checking
I was able to add the apiProduct in the results again but I don't see availability on the products I have tried. Can you check it out with the latest version if the data you want is in the extra.apiProduct
property?
Propery apiProduct is not consistently in the responses, depending on the links. I could not find a pattern to help you out debugging this. Do you have any insights for this?
Do you have a run id that resulted in products without apiProduct data?
Yes, you can check out run ID: QyEKZjqo6THJzKB3d Container URL: https://urhfhdqn75wd.runs.apify.net/
I have updated the actor, removed the old availability and added a new property called "isAvailable". Please try it out and let me know if that works for you.
Thanks, I will check it and get back to you. Another thing I noticed is that the new version takes considerably more time. Check this dataset's log of the latest version ( Dataset ID: SjiLZerVokyfrlUG0 ) compared to the one we are using currently ( Dataset ID: 4aIFdbGrD0ahjRAMf ). The new crawler version takes about 10 minutes for 60 links and the old version lasts about a minute. Can you please check this out too?
I have released a new version that should have improved speed when scraping product URLs.
I have attached a request body.
I have the same issue where I get 400 HTTP error Bad request.
I made 6 requests with the same body with a 1-minute difference, and all failed.
The requests took place at 5-11-2023 at 16:17 to 16:20 EET timezone.
As you can see the country is present in the body request and the actor version I have been using is 2.7.2.
Can you share the run ID with me so I can take a look at the logs algo?
There is no ID since the task was never started.
The list of "startUrls" is too big and Apify doesn't allow it. You can try to split into multiple runs with fewer startUrls on each or try to use a file as the source of startUrls.
It would be helpful to know what is the limit of this list since the way you define big is too abstract.
Agreed. Unfortunately, I wasn't able to find that information on Apify. That information seems to be missing from this page: https://docs.apify.com/platform/limits. I would recommend you use a txt file where every line is a URL and upload it as the startUrls file.
The txt approach you provided is not documented on how to run an actor. Am I missing something?
On Apify you have this option on the input section:
Actor Metrics
9 monthly users
-
3 stars
>99% runs succeeded
Created in Dec 2019
Modified a month ago