Changelog
All notable changes to the Luma Events Scraper Actor will be documented in this file.
The format is based on Keep a Changelog ,
and this project adheres to Semantic Versioning .
[Unreleased]
Implementation Phase - 2025-12-06
Added
Complete Actor implementation with API-first and HTML fallback approaches
Input schema with configurable parameters (startUrls, maxEvents, useApi, paginationLimit)
Output schema with dataset view for event data
Date parsing utility supporting multiple formats:
Date ranges (e.g., "Dec 6 & 7, 2025")
Single dates with time (e.g., "Dec 6, 2025 at 2:00 PM")
Relative dates ("Today", "Tomorrow")
ISO 8601 conversion with raw date preservation
API-based scraping:
Automatic discovery of place API ID from page
Pagination support
Error handling with fallback to HTML parsing
HTML parsing fallback:
CSS selector-based event extraction
Date extraction from event cards
URL normalization (relative to absolute)
Data validation and cleaning:
Required field validation (eventName, eventUrl, date)
Event name cleaning (trim whitespace)
URL normalization
Skip invalid events with warnings
Changed
Updated src/main.ts:
New Input interface matching input schema
Configured PlaywrightCrawler with appropriate concurrency (1) and maxRequestsPerCrawl (10)
Updated src/routes.ts:
Complete rewrite with event scraping logic
Hybrid API/HTML approach
Comprehensive date parsing
Updated .actor/actor.json:
Updated title and description
Set version to 0.0.1
Updated generatedBy metadata
Updated .actor/input_schema.json:
Added maxEvents, useApi, paginationLimit parameters
Updated default startUrls to https://luma.com/sf
Updated .actor/dataset_schema.json:
New schema for event data structure
Table view with eventName, eventUrl, date, rawDate, scrapedAt fields
Technical Implementation Details
API endpoint: https://api2.luma.com/discover/get-paginated-events
Place API ID extraction: Multiple fallback strategies (HTML patterns, Next.js data, default SF ID)
Date parsing: Handles various formats with ISO 8601 conversion
Error handling: Graceful degradation from API to HTML parsing
Output format: JSON with eventName, eventUrl, date (ISO), rawDate, scrapedAt, sourceUrl
Analysis Phase - 2025-12-06
Added
Initial project setup with PlaywrightCrawler template
PRD.md created with comprehensive requirements
CHANGELOG.md created for tracking changes
Browser analysis of https://luma.com/sf completed
Discovered
Website uses Next.js with client-side rendering
Events are loaded dynamically via API endpoint: https://api2.luma.com/discover/get-paginated-events
API endpoint parameters:
discover_place_api_id=discplace-BDj7GNbGlsF7Cka (for SF location)
pagination_limit=25 (default page size)
Events are rendered as button elements containing link elements
Event structure includes:
Event name (in link text)
Event URL (in href attribute)
Date information (embedded in name or separate element)
Additional metadata (organizer, location, RSVP status)
Technical Decisions
Will use PlaywrightCrawler (required for JavaScript-heavy site)
Will implement hybrid approach: API-first, HTML parsing fallback
Output format defined: eventName, eventUrl, date, rawDate, scrapedAt, sourceUrl
[0.0.1] - 2025-12-06
Initial Setup
Project initialized with Apify Actor template
TypeScript configuration
Playwright dependencies installed
Basic file structure created