Tours and Activity Unified
Under maintenancePricing
Pay per event
Tours and Activity Unified
Under maintenanceCross-platform tour and activity price scraper. Unify Viator and GetYourGuide search results into one normalized schema with prices, duration, ratings, review counts, and booking URLs per activity. Built for tour operators and OTA analysts.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Tours and Activity Unified
We do the dirty work so your dataset stays clean. 😈
$3.05 / 1,000 activity rows — Viator and GetYourGuide normalized into one schema. Scrape activity search results across two major experience-booking platforms (Viator and GetYourGuide) for any destination, normalize the data into one Pydantic-validated ResultRow, and push it straight into the Apify dataset. One run gives you a side-by-side view of how each platform prices and ranks the same destination — work that today requires running two single-source scrapers and writing your own normalization layer.
This Actor hits each platform's universal search endpoint, parses the live HTML with parsel, and emits one flat row per activity with platform, activity_id, activity_title, price_original, currency_original, duration_hours, rating, review_count, booking_url, and image_url. Per-platform failure is non-fatal — a block on one platform does not abort the run.
🎯 What this scrapes
Two major tours-and-activities platforms, one schema:
- Viator —
viator.com/searchResults/all?text=<query>(server-rendered HTML, 24 cards per page,data-automationtest attributes) - GetYourGuide —
getyourguide.com/s/?q=<query>(Vue.js shell with SSR card content, 24 cards per page) - Klook (v2 upgrade — currently returns 0 rows) —
klook.com/search/?keyword=<query>(gated by DataDome; requires browser rendering. Documented as a future upgrade behind Camoufox.)
Output rows carry every field needed for cross-platform comparison:
| Field | Type | Description |
|---|---|---|
platform | string | Platform literal (viator, getyourguide, klook) |
activity_id | string | Platform-canonical activity ID |
activity_title | string | Card title text |
location_query | string | Echo of the user input — useful when batching cities |
location_city | string | null | Best-effort city parsed from URL or query |
location_country | string | null | Best-effort country |
price_usd | number | null | USD price when the platform itself displays USD |
currency_original | string | null | ISO 4217 code parsed from the price symbol |
price_original | number | null | Numeric price in the displayed currency |
duration_hours | number | null | Activity duration in hours; midpoint for ranges |
rating | number | null | Star rating, 0-5 scale |
review_count | integer | null | Number of reviews |
operator_name | string | null | Tour operator/supplier when surfaced |
category | string | null | Card tag (tour, experience, ...) |
booking_url | string | Absolute URL to the activity page |
image_url | string | null | Absolute URL to the primary thumbnail |
scraped_at | string | ISO 8601 UTC timestamp |
🔥 Features
- Two platforms, one schema — drop the dataset straight into a spreadsheet or BI tool with no per-platform normalization.
- Currency-aware — symbol-to-ISO mapping (€/$/£/¥ -> EUR/USD/GBP/JPY);
price_usdis populated only when the platform itself displays USD. - Duration parser handles ranges —
"5 to 9 hours"becomes7.0(midpoint);"30 minutes"becomes0.5;"1 day"becomes24.0. - Per-platform isolation — one platform's failure (Cloudflare block, page restructure) does not abort the run; the others still produce data.
- No browser — pure HTTP with
curl-cffibrowser TLS fingerprint impersonation; low compute footprint and fast runs. - Pydantic v2 validation — input and output are model-validated; invalid input fails fast with a clear error before any network call.
- Optional Apify Proxy —
BUYPROXIES94952routing on by default to clear GetYourGuide's Cloudflare-fronted edge. - Exponential backoff on
408/429/503withRetry-Afterhonoured. - Configurable cap —
maxPerPlatformlets you cap each platform at 1-100 rows per run.
💡 Use cases
- Tour operator competitive intelligence — find every tour your competitors list in your city, compare prices, ratings, and durations side-by-side.
- OTA cross-platform analyst dashboards — feed a BI tool with snapshots of how Viator and GetYourGuide each rank a destination.
- Dynamic pricing strategy — track how the same
activity_idis priced on each platform and adjust your own listings. - Destination intelligence reports — pull "Paris" or "Tokyo" weekly into a named dataset and chart price drift over time.
- Inbound-tour-builder market research — discover which experiences dominate the first-page search results in a new destination.
- Travel-blogger affiliate research — surface high-rating, high-review-count activities to feature in destination guides.
- Travel-tech investor diligence — track top-of-funnel pricing across the experience-booking layer of the travel stack.
⚙️ How to use it
- Open the Actor input form.
- Type a Destination (
Paris,New York,Tokyo,Bali, ...). - (Optional) Pick Platforms — leave empty to scrape all three, or list a subset like
["viator"]or["getyourguide"]. Klook returns 0 rows in v1 (documented limitation). - (Optional) Set Max rows per platform — default 20, max 100.
- Leave Use Apify Proxy on (default) for cleaner exit IPs when GetYourGuide is fronted by Cloudflare.
- Click Start. Results stream into the default dataset.
Quick examples
Both supported platforms, default cap:
{"locationQuery": "Paris"}
GetYourGuide only, 5 rows (the QA fixture — fastest path to confirm output shape):
{"locationQuery": "Paris","platforms": ["getyourguide"],"maxPerPlatform": 5,"useProxy": false}
Viator only, 50 rows, direct routing:
{"locationQuery": "Tokyo","platforms": ["viator"],"maxPerPlatform": 50,"useProxy": false}
📥 Input
| JSON key | Type | Default | Description |
|---|---|---|---|
locationQuery | string | (required) | Destination text query (e.g. "Paris") |
platforms | array of literal | [] (= all 3) | Subset of viator / getyourguide / klook |
maxPerPlatform | integer | 20 | Cap on rows per platform (1-100) |
useProxy | boolean | true | Route via Apify Proxy BUYPROXIES94952 |
locationQuery is the only required field. Whitespace is stripped; blank values are rejected up-front by Pydantic before any network call.
📤 Output
One row per activity. See the What this scrapes table above for the full schema.
{"platform": "getyourguide","activity_id": "508441","activity_title": "Paris: Le Marais Guided Food Tour with Tastings","location_query": "Paris","location_city": "Paris","location_country": null,"price_usd": null,"currency_original": "EUR","price_original": 69.0,"duration_hours": 3.0,"rating": 4.9,"review_count": 506,"operator_name": null,"category": "experience","booking_url": "https://www.getyourguide.com/paris-l16/no-diet-club-unique-local-food-tour-in-paris-le-marais-t508441/","image_url": "https://cdn.getyourguide.com/image/.../tour_img/7b9edf635985a601.jpeg","scraped_at": "2026-05-16T22:00:00.000Z"}
💰 Pricing
Pay-Per-Event (PPE):
| Event | Rate | Trigger |
|---|---|---|
actor-start | $0.05 | Once per run at Actor boot |
result-row | $0.003 | Per activity row emitted |
Typical run cost (default maxPerPlatform=20, 2 working platforms, ~40 rows): ~$0.17.
Per 1,000 rows extrapolated: ~$3.05.
🚧 Limitations
- Klook returns 0 rows in v1 — the search endpoint is gated by DataDome anti-bot, which requires full browser execution to clear. We documented this up front and ship without it rather than over-promise; v2 will add a Camoufox path behind a feature flag.
- First page only — no pagination across multiple result pages. Each platform returns ~20-24 cards on the first page; the default cap is 20.
- No detail-page scraping — we scrape the search-results surface only. Itineraries, photo galleries, availability calendars, and meeting points are out of scope for v1.
- Currency follows the platform's display —
price_usdis populated only when the platform itself displays USD. We do not run our own FX conversion. - Search relevance is the platform's — "Paris" can include cards for Versailles or nearby destinations, depending on each platform's relevance engine.
- Apify free-tier residential proxy is limited —
BUYPROXIES94952is the only proxy group provisioned on this account; works for our scale.
❓ FAQ
Q: Why is Klook in the schema but returns 0 rows?
A: Klook gates every meaningful endpoint behind DataDome JS challenges that curl-cffi cannot clear without a full browser. Adding Klook would require Camoufox, which costs ~10x the compute of HTTP scraping. We kept the platform literal in the schema so v2 can land without breaking the dataset shape — but for v1, every Klook call returns [] with a WARNING. Use platforms: ["viator", "getyourguide"] to skip the wasted call entirely.
Q: Why default useProxy: true?
A: GetYourGuide is fronted by Cloudflare and occasionally throttles datacenter IPs. The default-on posture trades a tiny bit of latency for materially higher first-page success rate. If you are running on a clean residential network, you can set it to false.
Q: How is duration_hours parsed?
A: Viator and GetYourGuide write durations in a handful of formats: "3 hours", "1 hour", "30 minutes", "5 to 9 hours", "1 day", "2.5 hours". We parse all of them. For ranges, we use the midpoint ("5 to 9 hours" -> 7.0). Anything unparseable stays null rather than crashing the row.
Q: Will the Actor re-query the same activity across runs?
A: Yes — each run is independent. To track an activity over time, schedule periodic runs and write to a named dataset (Actor.open_dataset(name=...)) or export to your warehouse. The Apify default dataset retention is 7 days.
Q: Why no detail-page scraping? A: Each detail page is a heavier scrape (cancellation policy, photos, availability), and many are Cloudflare-protected. v1 ships the breadth-first surface-level price intel that 80% of buyers actually need; detail-page scraping is on the v2 roadmap.
Q: Can I batch multiple cities in one run?
A: Not in v1 — one locationQuery per run. To batch, schedule one Actor task per city (Apify supports unlimited parallel tasks on the free tier up to the concurrent-run cap). Each result row carries location_query so downstream pivots stay correct.
💬 Your feedback
Found a parser glitch or want a platform added? Open an issue on the Actor's Apify Store page. Pull requests welcome via the source code repository linked in the Actor's metadata.