Yelp Reviews Scraper - Sentiment, Topics, Competitor Delta
Pricing
from $10.00 / 1,000 business summaries
Yelp Reviews Scraper - Sentiment, Topics, Competitor Delta
Bulk Yelp review scraper with per-review sentiment, topic clusters (food, service, wait, value, parking), responder tracking, 12-month trend and competitor delta. LLM-ready JSON for reputation, local SEO and chain ops teams.
Yelp Reviews Pro
TL;DR for local SEO agencies, multi-unit franchise managers, and reputation-management teams: Pulls Yelp reviews for one or many businesses with built-in sentiment, 9-topic clustering, owner-response tracking, 12-month trend label, and pairwise competitor delta. Compared to a generic Yelp scraper, you get an intelligence layer on top (sentiment with negation handling, topic tags, response-gap metric, competitor delta, and an llm_ready Markdown summary mode for AI agents). Free Apify plan covers small business-input runs on your $5 platform credit. PPE charges scale per review. Upgrade to Apify Starter ($49/mo) for production volume.
Run it in 30 seconds
# Via the Apify Python SDKfrom apify_client import ApifyClientclient = ApifyClient("<YOUR_APIFY_TOKEN>")run = client.actor("seibs.co/yelp-reviews-pro").call(run_input={"mode": "single_business","business_inputs": ["https://www.yelp.com/biz/the-french-laundry-yountville"],"max_reviews_per_business": 200,"include_topic_clustering": true})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
Or via curl:
curl -X POST "https://api.apify.com/v2/acts/seibs.co~yelp-reviews-pro/run-sync-get-dataset-items?token=<YOUR_APIFY_TOKEN>" \-H "Content-Type: application/json" \-d '{"mode": "single_business", "business_inputs": ["https://www.yelp.com/biz/the-french-laundry-yountville"], "max_reviews_per_business": 200, "include_topic_clustering": true}'
Or click "Try for free" on this page if you prefer the no-code UI.
What you get
Each run produces:
- A clean dataset, filterable in the Apify console and downloadable as CSV or JSON
- An OUTPUT.html dashboard preview of your top records
- A sample-output preview at ./.actor/sample-output.json
Per-archetype custom artifacts shipped with this actor:
- top-negative-reviews.html (with suggested response templates and copy-to-clipboard buttons)
- sentiment-trend.csv (12-month trend with improving / flat / declining label)
- competitor-delta.csv (pairwise rating, response-rate, and top-complaint diff)
What does Yelp Reviews Pro do?
It wraps the agents/yelp-reviews upstream actor and layers an analysis pass on top: per-review sentiment with negation handling, topic tagging across nine common categories, owner-response stats, 12-month time-series with improving / flat / declining trend label, and pairwise competitor delta. Optional LLM-ready markdown summaries drop straight into agent prompts.
AI / RAG / Agent
Built for AI reputation-management agents and local-SEO bots. Set output_format=llm_ready to get a pre-summarized Markdown block per business (rating trend, top complaints, top praise, response gap, competitor delta) that a model can ingest in a single prompt. Per-review records carry sentiment, review_topics, reviewer_is_elite, and useful_count as embedding metadata. Compatible with LangChain, LlamaIndex, Pinecone, Weaviate, Chroma, and any MCP-aware agent runtime.
from apify_client import ApifyClientclient = ApifyClient("APIFY_TOKEN")run = client.actor("you/yelp-reviews-pro").call(run_input={"mode": "batch_analysis","business_inputs": ["https://www.yelp.com/biz/joes-pizza-new-york-9","https://www.yelp.com/biz/prince-street-pizza-new-york"],"max_reviews_per_business": 200,"review_sort": "newest","output_format": "llm_ready"})for item in client.dataset(run["defaultDatasetId"]).iterate_items():if item.get("record_type") == "business_summary":print(item["llm_summary_md"])
Features
- Per-review sentiment (positive / neutral / negative) with
sentiment_scorein [-1, 1] - lexicon-based, no API key, no per-call cost. - Topic clustering -
food_quality,service,wait_time,cleanliness,value_pricing,ambience,staff_friendliness,parking,accessibility, aggregated intotopic_distributionand ranked intotop_complaint_topics/top_praise_topics. - Responder tracking - business-owner response rate, average time-to-respond in days, owner reply sentiment distribution.
- Time-series breakdown - last 12 months of review counts + average rating, with derived
recent_trend. - Competitor delta mode - given 2-5 businesses, returns pairwise rating delta, review-count delta, common complaints, unique complaints per side.
- LLM-ready output - set
output_format=llm_readyand every record gets allm_summary_mdmarkdown block. - Yelp-native reviewer signals -
reviewer_is_elite,reviewer_review_count,reviewer_friend_count,reviewer_photo_count(shill / credibility weighting). - Review reactions -
useful_count,funny_count,cool_countper review.
Use cases
- Chain reputation managers monitoring 20-500 Yelp locations who currently glue together a raw review scrape + sentiment notebook + dashboard.
- Local SEO agencies running monthly client reports - drop the LLM markdown straight into the report template.
- Restaurant groups / multi-location service businesses with response-rate / time-to-respond / recent-trend KPIs.
- Competitive intel teams -
competitor_deltaanswers "what do customers complain about us vs the competition?" - AI product builders feeding local-business data into LLM workflows who need pre-summarized markdown, not raw JSON.
FAQ
Q: Is this legal?
A: Yes - Yelp reviews are public and we go through the upstream agents/yelp-reviews actor, which scrapes the public Yelp frontend (not the paid Yelp Fusion API). Use the data per Yelp's Terms of Service and applicable law.
Q: Why might a run fail?
A: (1) Yelp's anti-bot blocks the session - the row comes back with available: false and a reason. RESIDENTIAL proxy is mandatory; datacenter IPs are rejected. (2) Business URL has no /biz/<slug> segment - upstream returns nothing. (3) Pushing max_reviews_per_business above 500 on many businesses at once triggers rate limits - lower concurrency or split the run.
Q: How fresh is the data?
A: Live at crawl time. Reviews are read directly from the public Yelp page during the run. review_sort: newest returns most recent first - typically within minutes of being posted.
Q: Can I schedule this daily or weekly?
A: Yes - weekly is the standard cadence for chain reputation monitoring. Daily for crisis-watch on a single high-volume business. Use Apify Schedules; combine with recent_trend to alert on declining flips.
Q: How do I push results into a CRM or BI tool?
A: Two paths. (1) output_format: csv_friendly flattens reviews for direct import into BI dashboards (Looker, Power BI, Sheets). (2) output_format: llm_ready drops llm_summary_md straight into agent prompts or a client report template. Zapier/Make/n8n forward business-summary records to HubSpot, Salesforce, or a Slack channel on negative-trend alerts.
Q: How is this different from agents/yelp-reviews or tri_angle/yelp-review-scraper?
A: Those are the upstream raw-scraper layer - they pull reviews and exit. This actor wraps that scrape and layers an intelligence pass on top: per-review sentiment with negation handling, 9-topic clustering with aggregated top_complaint_topics / top_praise_topics, owner-responder metrics (response rate, time-to-respond, reply sentiment), 12-month time-series with improving / flat / declining trend label, pairwise competitor delta, and LLM-ready markdown summaries. You are paying for the analysis layer, not the scrape - if all you need is raw reviews, use the upstream directly.
Q: How accurate is the sentiment classifier? A: ~85% on English consumer reviews against human labels in spot-check sets; lexicon-based with 2-token negation lookback. Non-English (es, fr, de) runs ~70-75%.
Q: How does PPE pricing actually work here?
A: $0.010 per business_summary, $0.001 per review_record, $0.020 per competitor_delta_record, $0.005 per llm_summary. A 100-review business in JSON mode is about $0.11; a 5-business competitor delta with 100 reviews each is about $0.65.
Related Actors
- Pair with any lead-finder actor (
home-services-lead-finder,restaurants-lead-finder,salon-spa-lead-finder, etc.) - those build the lead list, this actor monitors each lead's Yelp reputation as a companion intelligence layer. google-maps-reviews-pro- same intelligence layer applied to Google Maps reviews. Run both for cross-platform sentiment + topic + responder coverage.reddit-topic-watcher- extend reputation monitoring beyond review platforms to Reddit complaint and praise threads.
Integrations
- Zapier - push to HubSpot/Salesforce/Pipedrive/Apollo/Klaviyo
- Make.com - workflow automation
- n8n - self-hosted automation
- Apify webhooks - POST to your endpoint
- API + dataset export (JSON/CSV/Excel/XML)
- MCP / AI agents - call from Claude/GPT/LangChain
Modes
| Mode | What it does | Inputs |
|---|---|---|
batch_analysis | Independent analysis of N businesses (up to 50). | List of Yelp URLs or business IDs. |
competitor_delta | Pairwise comparison of 2-5 businesses. | 2-5 inputs. |
single_business_deep | Max depth on one business; bumps max_reviews to 500+. | First input only. |
Input
See .actor/INPUT_SCHEMA.json. Sample:
{"mode": "batch_analysis","business_inputs": ["https://www.yelp.com/biz/joes-pizza-new-york-9","prince-street-pizza-new-york"],"max_reviews_per_business": 100,"review_sort": "newest","include_sentiment": true,"include_topic_clustering": true,"include_time_series": true,"output_format": "json","apify_proxy_groups": ["RESIDENTIAL"],"concurrency": 4}
Output
One record per business with the analysis layer attached. Sample:
{"record_type": "business_summary","business_name": "Joe's Pizza","yelp_url": "https://www.yelp.com/biz/joes-pizza-new-york-9","current_rating": 4.5,"total_review_count": 8421,"sentiment_distribution": {"positive_pct": 78.0, "negative_pct": 9.0},"top_praise_topics": ["food_quality", "value_pricing"],"top_complaint_topics": ["wait_time"],"responder_metrics": {"response_rate": 12.0, "avg_response_time_days": 4.8},"recent_trend": "improving","reviews": [{"rating": 5,"text": "Best slice in NY","sentiment": "positive","sentiment_score": 0.82,"review_topics": ["food_quality"],"reviewer_is_elite": true,"useful_count": 12,"funny_count": 2,"cool_count": 4}],"available": true,"scraped_at": "2026-05-16T12:00:00Z"}
Pricing
Pay-per-event:
| Event | Price | When charged |
|---|---|---|
business_summary | $0.010 | Once per business successfully analyzed. |
review_record | $0.001 | Once per individual review extracted. |
competitor_delta_record | $0.020 | Once per pairwise comparison. |
llm_summary | $0.005 | Once per business when output_format=llm_ready. |
Typical 100-review business in JSON mode: $0.11. 5-business competitor_delta with 100 reviews each: $0.65.
Save your input as an Apify Task
Apify Tasks let you save a configured input once and re-run it with a single click - no need to re-type search terms, locations, filters, or tier settings every time. Tasks are the foundation for everything that comes next: schedules, monitor mode, and webhook routing all attach to a saved Task, not to the raw actor.
Steps to save your current input as a Task:
- On this actor's Apify Store page, click
Runwith your input fully configured. - Click the
Save as taskbutton at the top of the run page. - Name the task something memorable (e.g.
Reviews for top 10 competitors - weekly). - Reload the task page and click
Startanytime to re-run with the same inputs.
Tasks unlock the next two features below: scheduling and monitor mode.
Run this weekly with Apify Schedules
Apify Schedules cron-run any saved Task automatically. Pair this with the saved Task above and you get hands-off recurring runs with no manual clicks, no missed weeks, and a steady stream of fresh data into your CRM or warehouse.
Steps to schedule a Task:
- Save your input as a Task (see above).
- Go to https://console.apify.com/schedules and click
Create new schedule. - Pick your Task and set the cron expression. Common patterns:
- Daily at 9am UTC:
0 9 * * * - Weekly on Mondays at 9am:
0 9 * * 1 - Monthly on the 1st:
0 9 1 * *
- Daily at 9am UTC:
- Save. Apify will run your Task on that schedule automatically, push the dataset to whatever integrations you have wired up, and fire run-completion webhooks for downstream automation.
Run weekly to track sentiment trends, catch negative reviews fast, and feed fresh review text into your VOC pipeline.
Monitor mode (v2, beta)
Monitor mode is the v2 evolution of this actor and is currently in BETA. It turns a recurring schedule into a true change-feed instead of a firehose of duplicate records.
How it works:
- When this actor runs under an Apify Schedule, monitor mode is enabled automatically.
- Instead of emitting ALL records every run, it emits ONLY records that are NEW or CHANGED since the last scheduled run.
- A digest record summarizes the delta (X new, Y changed, Z removed) at the top of every run.
- Optional: provide a Slack or email webhook URL in the
monitor_webhook_urlinput field and the digest fires there too, so your team gets the delta in their inbox or channel without polling the dataset. - Cost: a single
scheduled_delta_runevent ($0.05) per scheduled run, plus standard PPE on emitted delta records only. Predictable monthly cost, no surprise bills from re-charging for unchanged records.
Monitor mode is rolling out to the top 3 actors first (this one included if it's hotel-motel-lead-finder, google-maps-reviews-pro, or mcp-accounting-firm-leads). Full portfolio coverage by end of June.
Support
Open an issue on the actor's GitHub or contact via Apify Store. Include the run ID and input config.
Changelog
See ./CHANGELOG.md.
Found this useful?
If this actor saved you time or money, please consider leaving a quick review on the Apify Store. Reviews help other buyers find work that solves their problem and let me prioritize the features paying customers actually use. Leave a review: https://apify.com/seibs.co/yelp-reviews-pro#reviews