Scraper Marseille Accessible Events avatar
Scraper Marseille Accessible Events

Under maintenance

Pricing

Pay per usage

Go to Store
Scraper Marseille Accessible Events

Scraper Marseille Accessible Events

Under maintenance

Developed by

Yann Feunteun

Yann Feunteun

Maintained by Community

Collects wheelchair-friendly events from the official Marseille tourism site. It opens each event page, pulls title, summary, dates, venue, images and GPS coordinates, removes duplicates, and delivers a clean, ready-to-use list in an Apify dataset.

0.0 (0)

Pricing

Pay per usage

0

Total users

2

Monthly users

1

Last modified

3 hours ago

🧭 Scraper β€” Marseille Accessible Events

Crawlee + TypeScript actor that fetches the accessible-events playlist from the official Marseille-Tourisme API, then visits every event page, parses its application/ld+json schema and stores a clean, flat Event record to an Apify dataset.

StackWhy
Apify SDK v3Cloud-ready actor runtime, key-value stores & datasets
Crawlee 3 Β· CheerioCrawlerFast HTTP/HTML scraping with built-in queue
TypeScriptStrong typing (see src/types.ts β†’ Event)
dayjsElegant date maths for the playlist facet

✨ Features

  • One-shot playlist POST – size =maxEvents, start =0.
  • Dynamic date window facet start = today 00:00, end = today + monthsAhead (end of month).
  • Only events with wheelchair criteria are requested.
  • Extracts title, description, dates, venue, geo & images from the schema graph, deduplicating overlapping WebPage/Event fields.
  • Output dataset contains tidy Event objects (see schema below).

πŸ“¦ Project structure

src/
main.ts β‡  actor entry – seeds playlist POST & starts crawler
routes.ts β‡  Cheerio router (PLAYLIST + EVENT\_PAGE)
types.ts β‡  export interface Event
package.json
README.md β‡  you are here
apify.json β‡  actor manifest
INPUT\_SCHEMA.json β‡  UI + validation for actor input

πŸ”§ Input

FieldTypeDefaultPrefillDescription
maxEventsinteger100042Max number of events to request in the playlist POST
monthsAheadinteger3–Date-window length (today β†’ today + N months)

INPUT_SCHEMA.json enforces both fields; maxEvents is required.

Example input.json

{
"maxEvents": 42,
"monthsAhead": 3
}

▢️ Run locally

# install deps
npm install
# optional: build once
npm run build
# run with the example input
apify run -i input.json # or npm start

Add --purge to clear previous datasets / queues:

$apify run --purge -i input.json

πŸ—‚ Output dataset (Event)

interface Event {
url: string; // canonical event URL
name: string | null;
description?: string;
startDate?: string; // ISO YYYY-MM-DD
endDate?: string;
venue?: string;
address?: string;
city?: string;
postalCode?: string;
latitude?: string;
longitude?: string;
images?: string[];
}

Empty / duplicate fields are removed before the record is pushed.


πŸ”‘ Environment variables

None. If you route traffic through the Apify proxy, set it in apify.json or export APIFY_PROXY_PASSWORD.


πŸ—ΊοΈ High-level flow

Input β†’ build POST body ─┐
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ CheerioCrawler ────────────────┐
β”‚ β”‚
PLAYLIST handler EVENT_PAGE handler
β€’ parse JSON β€’ parse LD+JSON
β€’ enqueue https:// links (label:EVENT_PAGE) β€’ build typed Event
β”‚ β”‚
└─► RequestQueue 'playlist' β—„β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
Apify Dataset ←─── Event records

πŸ“ License

MIT Β© 2025 yfe404