Yelp Business Info Scraper avatar

Yelp Business Info Scraper

Try for free

1 day trial then $25.00/month - No credit card required now

Go to Store
Yelp Business Info Scraper

Yelp Business Info Scraper

delicious_zebu/yelp-business-info-scraper
Try for free

1 day trial then $25.00/month - No credit card required now

Quickly gather rich, detailed data from Yelp business pages—perfect for insights and analysis! 🚀

Developer
Maintained by Community

Actor Metrics

  • 19 Monthly users

  • No reviews yet

  • 18 bookmarks

  • >99% runs succeeded

  • 15 hours response time

  • Created in Nov 2024

  • Modified 8 days ago

competent_path avatar

Error handling

Closed
Competent Path (competent_path) opened this issue
9 days ago

Sometimes I get invalid URLs on my end but I do not have a way to see if those are valid or not until I scrape. Example: https://www.yelp.com/biz/complete-office-cleaning-orange

At the moment scraper does not indicate any errors, and completes without any results returned.

Can you please add empty result or error flag or something else?

delicious_zebu avatar

Hi, I have added a new field called "is_page_not_found" which serves as a flag indicating whether the current parameter page is invalid. If the current parameter is invalid, is_page_not_found is set to True. If the current parameter is valid, is_page_not_found is set to False.

You can test with the latest version: 0.0.13.

competent_path avatar

Thanks a lot. I just came across a scenario where it returned only 3 results when 4 have been requested

Input:

1{
2  "Urls": [
3    "https://www.yelp.com/biz/kings-of-vapor-and-smoke-akron-4",
4    "https://www.yelp.com/biz/kings-of-vapor-and-smoke-akron-6",
5    "https://www.yelp.com/biz/kings-of-vapor-akron-3",
6    "https://www.yelp.com/biz/kings-of-vapor-cuyahoga-falls-2"
7  ]
8}

Logs:

12025-03-05T19:45:31.357Z ACTOR: Pulling Docker image of build vzVGBFI9NJwOfQihZ from repository.
22025-03-05T19:45:31.461Z ACTOR: Creating Docker container.
32025-03-05T19:45:31.510Z ACTOR: Starting Docker container.
42025-03-05T19:45:32.808Z [apify] INFO  Initializing Actor...
52025-03-05T19:45:32.810Z [apify] INFO  System info ({"apify_sdk_version": "2.3.1", "apify_client_version": "1.9.2", "crawlee_version": "0.5.4", "python_version": "3.12.9", "os": "linux"})
62025-03-05T19:45:32.837Z [apify] INFO  Hello from the Actor!
72025-03-05T19:46:07.441Z [apify] ERROR Error occurred while requesting parameters: https://www.yelp.com/biz/kings-of-vapor-cuyahoga-falls-2, skipping. Error: RetryError[<Future at 0x74bb07d86f30 state=finished raised Exception>]
82025-03-05T19:46:07.442Z [apify] INFO  Exiting Actor ({"exit_code": 0})
delicious_zebu avatar

Hi, your issue occurs because the Actor encountered a CAPTCHA during execution. The Actor will automatically retry up to 10 times, and if it still cannot bypass the CAPTCHA, it will throw an error and skip that parameter. In this case, you need to reinsert the failed parameters into the Actor and run the extraction again.

Note: I've been quite busy lately, but I may optimize this behavior in the future.

competent_path avatar

Thanks. I understand you wont be able to handle 100% cases all the time I just wanted actor to return 4 results when 4 results have been requested and indicate that one failed due to captcha or something else. This way on my end I have a reliable way to know that I need to retry.

delicious_zebu avatar

Hi, you can check which parameters failed through the log messages in the Log Panel, and then simply rerun those parameters.