Yelp Events
Pricing
from $0.01 / 1,000 results
Pricing
from $0.01 / 1,000 results
Rating
0.0
(0)
Developer
Payam Pirooznia
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Yelp Events Apify Actor
A standalone Apify actor repository that extracts public event listings from Yelp city events pages. It is designed for BFU-Feeder ingestion, but the repository itself is intended to be developed, tested, and deployed independently.
What It Does
| Step | Description |
|---|---|
| 1. Navigate | Load Yelp events browse pages (e.g., /events/austin-tx-us/browse?official=0) |
| 1b. Throttle | Enforce a 500 ms same-domain delay between Yelp page requests to reduce firewall/challenge risk |
| 2. Classify | Detect page type: events list, event detail, blocked/challenge, unsupported layout |
| 3. Parse list | Extract event cards using multi-strategy selectors |
| 4. Enrich (optional) | Follow event detail links for richer data |
| 5. Map & emit | Merge card + detail data into YelpEventRecord aligned with BFU-Feeder RawEventContract |
| 6. Diagnostics | Save per-source and global run metrics to key-value store |
Input
| Field | Type | Default | Description |
|---|---|---|---|
startUrls | StartUrl[] | required | Yelp events page URLs |
maxItems | integer | 200 | Max total events (0 = unlimited) |
maxEventsPerSource | integer | 100 | Max events per source URL (0 = unlimited) |
categories | string[] | undefined | Optional category filter (see below) |
includeDetailPages | boolean | true | Follow event detail links for enrichment |
proxyConfiguration | object | undefined | Apify proxy settings |
browserType | "chromium" | "firefox" | "webkit" | "chromium" | Browser engine used by Playwright |
browserUserAgents | string[] | built-in pool | Optional custom user-agent pool; one is selected randomly per run |
browserViewportWidth | integer | 1366 | Browser viewport width in pixels |
browserViewportHeight | integer | 768 | Browser viewport height in pixels |
browserJavaScriptEnabled | boolean | true | Enable JavaScript in browser context |
browserIgnoreHTTPSErrors | boolean | true | Ignore TLS/HTTPS certificate errors |
debugMode | boolean | false | Verbose logging |
saveDebugHtml | boolean | false | Save raw HTML to key-value store |
saveDebugScreenshots | boolean | false | Save page screenshots |
Example Start URLs
https://www.yelp.com/events/mission-viejo-ca-us/browse?official=0https://www.yelp.com/events/austin-tx-us/browse?official=0https://www.yelp.com/events/san-diego-ca-us/browse?official=0
Category Filter
The optional categories field filters events by Yelp's event category. When provided,
each start URL is expanded into one request per selected category by appending &c=<id>.
| Category | ID |
|---|---|
| Other | 0 |
| Music | 1 |
| Visual Arts | 2 |
| Performing Arts | 3 |
| Film | 4 |
| Lectures & Books | 5 |
| Fashion | 6 |
| Food & Drink | 7 |
| Festivals & Fairs | 8 |
| Charities | 9 |
| Sports & Active life | 10 |
| Nightlife | 11 |
| Kids & Family | 12 |
Example: 2 start URLs with ["Music", "Food & Drink"] produces 4 requests.
Browser Runtime Defaults
- Default browser type:
chromium - Supported browser types:
chromium,firefox,webkit - Default viewport:
1366x768 - Default
javaScriptEnabled:true - Default
ignoreHTTPSErrors:true - User-Agent selection:
- If
browserUserAgentsis provided and non-empty, one value is chosen randomly per run. - Otherwise, one value is chosen randomly from a built-in fallback pool.
- If
When browserType is chromium, the actor always applies fixed launch arguments:
--disable-gpu--no-sandbox--disable-dev-shm-usage--disable-extensions--disable-popup-blocking--disable-blink-features=AutomationControlled--disable-web-security--disable-background-networking--disable-notifications--disable-infobars
Output
Each dataset item is a YelpEventRecord with these key fields:
| Field | Type | Description |
|---|---|---|
source_family | "yelp_events" | Fixed identifier |
platform | "yelp" | Fixed identifier |
raw_title | string | null | Event title |
raw_description | string | null | Event description text (line breaks preserved, including <br>-derived breaks) |
raw_start_text | string | null | Start date/time text (ISO when available) |
raw_end_text | string | null | End date/time text (ISO when available) |
raw_location_text | string | null | Venue/location text (line breaks, including <br>, normalized to comma-space) |
location | string | null | Venue/business name extracted from the event page biz-name link |
zip_code | string | null | 5-digit ZIP extracted from raw_location_text when pattern is present |
raw_image_url | string | null | Event image URL |
event_url | string | null | Direct event page URL |
external_event_id | string | null | Yelp event slug |
raw_payload_json | object | Card HTML, detail HTML, strategy, version |
is_cancelled_hint | boolean | null | Cancellation detection |
starts_at_hint | string | null | ISO datetime if parseable from start text |
ends_at_hint | string | null | ISO datetime if parseable from end text |
fetched_at | string | ISO timestamp |
Full schema: see src/types.ts YelpEventRecord interface.
BFU-Feeder Integration
This repository's output aligns with BFU-Feeder's RawEventContract:
| Actor field | Contract field |
|---|---|
raw_title | raw_title |
raw_start_text | raw_start_text |
raw_location_text | raw_location_text |
zip_code | zip_code |
event_url | raw_url |
raw_image_url | raw_image_url |
raw_description | raw_description |
raw_end_text | raw_end_text |
ends_at_hint | ends_at_hint |
raw_payload_json | raw_payload_json |
is_cancelled_hint | is_cancelled_hint |
external_event_id | external_event_id (prefixed with apify- by mapper) |
fetched_at | fetched_at |
Development
Prerequisites
- Node.js 18+
- npm 9+
Setup
$npm install
Build
npm run build # TypeScript → dist/make build # Same as npm run build
Test
npm test # Run all tests with vitestnpm run test:coverage # Run tests with coverage outputnpm run test:watch # Watch modemake test # Run tests with coverage outputmake test-watch # Watch mode via make
Common Make targets
make help # List available targetsmake install # Install dependenciesmake build # Compile TypeScriptmake test # Run tests with coverage reportmake test-watch # Run Vitest in watch modemake run # Run the actor locally with Apify CLImake clean # Remove dist/, coverage/, and local runtime artifacts
Run locally
# Using Apify CLIapify run --input='{"startUrls": [{"url": "https://www.yelp.com/events/austin-tx-us/browse?official=0"}], "maxItems": 10}'
Architecture
src/├── main.ts # Actor entry point + PlaywrightCrawler orchestration├── types.ts # All TypeScript interfaces + category map├── url_utils.ts # URL expansion for category filter├── navigation.ts # Page classification (list, detail, blocked, unsupported)├── list_parser.ts # Multi-strategy event card extraction from browse pages├── detail_parser.ts # Event detail page enrichment├── output_mapper.ts # Merge card + detail → YelpEventRecord├── browser_runtime.ts # Browser runtime normalization and launch settings└── diagnostics.ts # Run metrics trackertests/├── url_utils.test.ts├── list_parser.test.ts├── detail_parser.test.ts├── output_mapper.test.ts├── browser_runtime.test.ts└── diagnostics.test.ts
Parser Strategies
The list parser tries multiple selector strategies in order:
- classic-events-list —
ul.events-index_events-list liand similar containers - card-layout —
[class*="event-card"]and data-testid patterns - anchor-fallback — Links to
/events/detail pages with container inference
Page Classification
| Classification | Meaning |
|---|---|
yelp_events_list | Yelp events browse page with event cards |
yelp_event_detail | Individual event detail page |
blocked_challenge | CAPTCHA, rate-limit block, or "unusual activity" page |
unsupported_layout | Page loaded but no recognizable events layout |
error | Navigation or timeout error |
Diagnostics
Run diagnostics are saved to the key-value store under the key run-diagnostics and include:
- Pages visited, events found/emitted/skipped
- Detail page attempt/success counts
- Partial failure count (events emitted with warnings)
- Per-source breakdown (URL, classification, counts, warnings)
- Global parser warnings and unsupported layout notes