
Website Content Crawler
Pricing
Pay per usage

Website Content Crawler
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
4.6 (38)
Pricing
Pay per usage
1310
Total users
49.4k
Monthly users
6.9k
Runs succeeded
>99%
Issue response
3.8 days
Last modified
7 days ago
simple page is throwing an error
Open
Here is an example of a page which isn't being crawled properly.
https://www.bestbuy.com/site/help-topics/zip-payments/pcmcat1678205761116.c?id=pcmcat1678205761116
The crawler throws a tons of error and just returns the URL as the output.

Hey, thanks for using Website Content Crawler!
The problem might be the URL showing a country selection splash screen. Use this URL instead: https://www.bestbuy.com/site/help-topics/zip-payments/pcmcat1678205761116.c?id=pcmcat1678205761116&intl=nosplash, with &intl=nosplash
.
Please see my test Actor run: https://console.apify.com/view/runs/fduJ69lkSYeNJO2kt
Jakub
burgundy_zebra
Hi Jakub,
Appreciate that you looked into it.
We scrape a lot of different web pages for different customers. As a result we can't identify the escape hatch for each company to bypass the country check. Instead, I was expecting the countrycode configuration to have the same effect. Why is that not working?
See attached where I used US for the run I shared earlier.

Hey,
I tested the crawler using the US residential proxy configuration as shown in your screenshot, but the site still requires &intl=nosplash
to bypass the country selection screen.
The Website Content Crawler is a generic tool designed for most websites, but some, like this one, need specific workarounds to handle features like the country selection screen, which the crawler doesn't support. You might find Best Buy scrapers in the Apify Store, though they focus on product scraping, not help pages: https://apify.com/store/categories?search=bestbuy.
Jakub
burgundy_zebra
We dont scrape products. We monitor pages to see if the fall out of compliance on behalf of our customers. If you notice on this page - see screenshot(https://cdn.zappy.app/93fe6e185ea49cc3c9cf21830e6e9818.png), I don't need to add intl=nosplash. and it still works. This is because my IP is coming from the US. As a result, the source IP of the crawler needs to be a US IP. How do we get apify to use a US based IP Address for crawling.

Hi,
Apologies for the delayed response.
You can set the Apify proxy country under Crawler Settings > Proxy Configuration > Proxy Country by selecting United States
. I tested this input option, and it now works without the &intl=nosplash
using the US proxy: https://console.apify.com/view/runs/XLg7RLb9JNgrTN6RR
Please let me know if this solves your issue.
Jakub