All notable changes to the Luma Events Scraper Actor will be documented in this file.
The format is based on Keep a Changelog ,
and this project adheres to Semantic Versioning .
Implementation Phase - 2025-12-06
Complete Actor implementation with API-first and HTML fallback approaches
Input schema with configurable parameters (startUrls, maxEvents, useApi, paginationLimit)
Output schema with dataset view for event data
Date parsing utility supporting multiple formats:
Date ranges (e.g., "Dec 6 & 7, 2025")
Single dates with time (e.g., "Dec 6, 2025 at 2:00 PM")
Relative dates ("Today", "Tomorrow")
ISO 8601 conversion with raw date preservation
API-based scraping:
Automatic discovery of place API ID from page
Pagination support
Error handling with fallback to HTML parsing
HTML parsing fallback:
CSS selector-based event extraction
Date extraction from event cards
URL normalization (relative to absolute)
Data validation and cleaning:
Required field validation (eventName, eventUrl, date)
Event name cleaning (trim whitespace)
URL normalization
Skip invalid events with warnings
Updated src/main.ts :
New Input interface matching input schema
Configured PlaywrightCrawler with appropriate concurrency (1) and maxRequestsPerCrawl (10)
Updated src/routes.ts :
Complete rewrite with event scraping logic
Hybrid API/HTML approach
Comprehensive date parsing
Updated .actor/actor.json :
Updated title and description
Set version to 0.0.1
Updated generatedBy metadata
Updated .actor/input_schema.json :
Added maxEvents, useApi, paginationLimit parameters
Updated default startUrls to https://luma.com/sf
Updated .actor/dataset_schema.json :
New schema for event data structure
Table view with eventName, eventUrl, date, rawDate, scrapedAt fields
Technical Implementation Details
API endpoint: https://api2.luma.com/discover/get-paginated-events
Place API ID extraction: Multiple fallback strategies (HTML patterns, Next.js data, default SF ID)
Date parsing: Handles various formats with ISO 8601 conversion
Error handling: Graceful degradation from API to HTML parsing
Output format: JSON with eventName, eventUrl, date (ISO), rawDate, scrapedAt, sourceUrl
Analysis Phase - 2025-12-06
Initial project setup with PlaywrightCrawler template
PRD.md created with comprehensive requirements
CHANGELOG.md created for tracking changes
Browser analysis of https://luma.com/sf completed
Website uses Next.js with client-side rendering
Events are loaded dynamically via API endpoint: https://api2.luma.com/discover/get-paginated-events
API endpoint parameters:
discover_place_api_id=discplace-BDj7GNbGlsF7Cka (for SF location)
pagination_limit=25 (default page size)
Events are rendered as button elements containing link elements
Event structure includes:
Event name (in link text)
Event URL (in href attribute)
Date information (embedded in name or separate element)
Additional metadata (organizer, location, RSVP status)
Will use PlaywrightCrawler (required for JavaScript-heavy site)
Will implement hybrid approach: API-first, HTML parsing fallback
Output format defined: eventName, eventUrl, date, rawDate, scrapedAt, sourceUrl
Project initialized with Apify Actor template
TypeScript configuration
Playwright dependencies installed
Basic file structure created