
Scraper Marseille Accessible Events
Under maintenance
Pricing
Pay per usage

Scraper Marseille Accessible Events
Under maintenance
Collects wheelchair-friendly events from the official Marseille tourism site. It opens each event page, pulls title, summary, dates, venue, images and GPS coordinates, removes duplicates, and delivers a clean, ready-to-use list in an Apify dataset.
0.0 (0)
Pricing
Pay per usage
0
Total users
2
Monthly users
1
Last modified
3 hours ago
π§ Scraper β Marseille Accessible Events
Crawlee + TypeScript actor that fetches the accessible-events playlist from the official
Marseille-Tourisme API, then visits every event page, parses its
application/ld+json
schema and stores a clean, flat Event record to
an Apify dataset.
Stack | Why |
---|---|
Apify SDK v3 | Cloud-ready actor runtime, key-value stores & datasets |
Crawlee 3 Β· CheerioCrawler | Fast HTTP/HTML scraping with built-in queue |
TypeScript | Strong typing (see src/types.ts β Event ) |
dayjs | Elegant date maths for the playlist facet |
β¨ Features
- One-shot playlist POST β size =
maxEvents
, start =0
. - Dynamic date window facet
start = today 00:00
,end = today + monthsAhead (end of month)
. - Only events with wheelchair criteria are requested.
- Extracts title, description, dates, venue, geo & images from the schema graph, deduplicating overlapping WebPage/Event fields.
- Output dataset contains tidy
Event
objects (see schema below).
π¦ Project structure
src/main.ts β actor entry β seeds playlist POST & starts crawlerroutes.ts β Cheerio router (PLAYLIST + EVENT\_PAGE)types.ts β export interface Eventpackage.jsonREADME.md β you are hereapify.json β actor manifestINPUT\_SCHEMA.json β UI + validation for actor input
π§ Input
Field | Type | Default | Prefill | Description |
---|---|---|---|---|
maxEvents | integer | 1000 | 42 | Max number of events to request in the playlist POST |
monthsAhead | integer | 3 | β | Date-window length (today β today + N months) |
INPUT_SCHEMA.json
enforces both fields; maxEvents
is required.
Example input.json
{"maxEvents": 42,"monthsAhead": 3}
βΆοΈ Run locally
# install depsnpm install# optional: build oncenpm run build# run with the example inputapify run -i input.json # or npm start
Add --purge
to clear previous datasets / queues:
$apify run --purge -i input.json
π Output dataset (Event)
interface Event {url: string; // canonical event URLname: string | null;description?: string;startDate?: string; // ISO YYYY-MM-DDendDate?: string;venue?: string;address?: string;city?: string;postalCode?: string;latitude?: string;longitude?: string;images?: string[];}
Empty / duplicate fields are removed before the record is pushed.
π Environment variables
None.
If you route traffic through the Apify proxy, set it in apify.json
or export APIFY_PROXY_PASSWORD
.
πΊοΈ High-level flow
Input β build POST body βββββββββββββββββββ CheerioCrawler ββββββββββββββββββ βPLAYLIST handler EVENT_PAGE handlerβ’ parse JSON β’ parse LD+JSONβ’ enqueue https:// links (label:EVENT_PAGE) β’ build typed Eventβ ββββΊ RequestQueue 'playlist' ββββββββββββββββββββββApify Dataset ββββ Event records
π License
MIT Β© 2025 yfe404