Tours and Activity Unified avatar

Tours and Activity Unified

Under maintenance

Pricing

Pay per event

Go to Apify Store
Tours and Activity Unified

Tours and Activity Unified

Under maintenance

Cross-platform tour and activity price scraper. Unify Viator and GetYourGuide search results into one normalized schema with prices, duration, ratings, review counts, and booking URLs per activity. Built for tour operators and OTA analysts.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Tours and Activity Unified

Tours and Activity Unified

We do the dirty work so your dataset stays clean. 😈

$3.05 / 1,000 activity rows — Viator and GetYourGuide normalized into one schema. Scrape activity search results across two major experience-booking platforms (Viator and GetYourGuide) for any destination, normalize the data into one Pydantic-validated ResultRow, and push it straight into the Apify dataset. One run gives you a side-by-side view of how each platform prices and ranks the same destination — work that today requires running two single-source scrapers and writing your own normalization layer.

This Actor hits each platform's universal search endpoint, parses the live HTML with parsel, and emits one flat row per activity with platform, activity_id, activity_title, price_original, currency_original, duration_hours, rating, review_count, booking_url, and image_url. Per-platform failure is non-fatal — a block on one platform does not abort the run.

🎯 What this scrapes

Two major tours-and-activities platforms, one schema:

  1. Viatorviator.com/searchResults/all?text=<query> (server-rendered HTML, 24 cards per page, data-automation test attributes)
  2. GetYourGuidegetyourguide.com/s/?q=<query> (Vue.js shell with SSR card content, 24 cards per page)
  3. Klook (v2 upgrade — currently returns 0 rows)klook.com/search/?keyword=<query> (gated by DataDome; requires browser rendering. Documented as a future upgrade behind Camoufox.)

Output rows carry every field needed for cross-platform comparison:

FieldTypeDescription
platformstringPlatform literal (viator, getyourguide, klook)
activity_idstringPlatform-canonical activity ID
activity_titlestringCard title text
location_querystringEcho of the user input — useful when batching cities
location_citystring | nullBest-effort city parsed from URL or query
location_countrystring | nullBest-effort country
price_usdnumber | nullUSD price when the platform itself displays USD
currency_originalstring | nullISO 4217 code parsed from the price symbol
price_originalnumber | nullNumeric price in the displayed currency
duration_hoursnumber | nullActivity duration in hours; midpoint for ranges
ratingnumber | nullStar rating, 0-5 scale
review_countinteger | nullNumber of reviews
operator_namestring | nullTour operator/supplier when surfaced
categorystring | nullCard tag (tour, experience, ...)
booking_urlstringAbsolute URL to the activity page
image_urlstring | nullAbsolute URL to the primary thumbnail
scraped_atstringISO 8601 UTC timestamp

🔥 Features

  • Two platforms, one schema — drop the dataset straight into a spreadsheet or BI tool with no per-platform normalization.
  • Currency-aware — symbol-to-ISO mapping (€/$/£/¥ -> EUR/USD/GBP/JPY); price_usd is populated only when the platform itself displays USD.
  • Duration parser handles ranges"5 to 9 hours" becomes 7.0 (midpoint); "30 minutes" becomes 0.5; "1 day" becomes 24.0.
  • Per-platform isolation — one platform's failure (Cloudflare block, page restructure) does not abort the run; the others still produce data.
  • No browser — pure HTTP with curl-cffi browser TLS fingerprint impersonation; low compute footprint and fast runs.
  • Pydantic v2 validation — input and output are model-validated; invalid input fails fast with a clear error before any network call.
  • Optional Apify ProxyBUYPROXIES94952 routing on by default to clear GetYourGuide's Cloudflare-fronted edge.
  • Exponential backoff on 408 / 429 / 503 with Retry-After honoured.
  • Configurable capmaxPerPlatform lets you cap each platform at 1-100 rows per run.

💡 Use cases

  • Tour operator competitive intelligence — find every tour your competitors list in your city, compare prices, ratings, and durations side-by-side.
  • OTA cross-platform analyst dashboards — feed a BI tool with snapshots of how Viator and GetYourGuide each rank a destination.
  • Dynamic pricing strategy — track how the same activity_id is priced on each platform and adjust your own listings.
  • Destination intelligence reports — pull "Paris" or "Tokyo" weekly into a named dataset and chart price drift over time.
  • Inbound-tour-builder market research — discover which experiences dominate the first-page search results in a new destination.
  • Travel-blogger affiliate research — surface high-rating, high-review-count activities to feature in destination guides.
  • Travel-tech investor diligence — track top-of-funnel pricing across the experience-booking layer of the travel stack.

⚙️ How to use it

  1. Open the Actor input form.
  2. Type a Destination (Paris, New York, Tokyo, Bali, ...).
  3. (Optional) Pick Platforms — leave empty to scrape all three, or list a subset like ["viator"] or ["getyourguide"]. Klook returns 0 rows in v1 (documented limitation).
  4. (Optional) Set Max rows per platform — default 20, max 100.
  5. Leave Use Apify Proxy on (default) for cleaner exit IPs when GetYourGuide is fronted by Cloudflare.
  6. Click Start. Results stream into the default dataset.

Quick examples

Both supported platforms, default cap:

{
"locationQuery": "Paris"
}

GetYourGuide only, 5 rows (the QA fixture — fastest path to confirm output shape):

{
"locationQuery": "Paris",
"platforms": ["getyourguide"],
"maxPerPlatform": 5,
"useProxy": false
}

Viator only, 50 rows, direct routing:

{
"locationQuery": "Tokyo",
"platforms": ["viator"],
"maxPerPlatform": 50,
"useProxy": false
}

📥 Input

JSON keyTypeDefaultDescription
locationQuerystring(required)Destination text query (e.g. "Paris")
platformsarray of literal[] (= all 3)Subset of viator / getyourguide / klook
maxPerPlatforminteger20Cap on rows per platform (1-100)
useProxybooleantrueRoute via Apify Proxy BUYPROXIES94952

locationQuery is the only required field. Whitespace is stripped; blank values are rejected up-front by Pydantic before any network call.

📤 Output

One row per activity. See the What this scrapes table above for the full schema.

{
"platform": "getyourguide",
"activity_id": "508441",
"activity_title": "Paris: Le Marais Guided Food Tour with Tastings",
"location_query": "Paris",
"location_city": "Paris",
"location_country": null,
"price_usd": null,
"currency_original": "EUR",
"price_original": 69.0,
"duration_hours": 3.0,
"rating": 4.9,
"review_count": 506,
"operator_name": null,
"category": "experience",
"booking_url": "https://www.getyourguide.com/paris-l16/no-diet-club-unique-local-food-tour-in-paris-le-marais-t508441/",
"image_url": "https://cdn.getyourguide.com/image/.../tour_img/7b9edf635985a601.jpeg",
"scraped_at": "2026-05-16T22:00:00.000Z"
}

💰 Pricing

Pay-Per-Event (PPE):

EventRateTrigger
actor-start$0.05Once per run at Actor boot
result-row$0.003Per activity row emitted

Typical run cost (default maxPerPlatform=20, 2 working platforms, ~40 rows): ~$0.17. Per 1,000 rows extrapolated: ~$3.05.

🚧 Limitations

  • Klook returns 0 rows in v1 — the search endpoint is gated by DataDome anti-bot, which requires full browser execution to clear. We documented this up front and ship without it rather than over-promise; v2 will add a Camoufox path behind a feature flag.
  • First page only — no pagination across multiple result pages. Each platform returns ~20-24 cards on the first page; the default cap is 20.
  • No detail-page scraping — we scrape the search-results surface only. Itineraries, photo galleries, availability calendars, and meeting points are out of scope for v1.
  • Currency follows the platform's displayprice_usd is populated only when the platform itself displays USD. We do not run our own FX conversion.
  • Search relevance is the platform's — "Paris" can include cards for Versailles or nearby destinations, depending on each platform's relevance engine.
  • Apify free-tier residential proxy is limitedBUYPROXIES94952 is the only proxy group provisioned on this account; works for our scale.

❓ FAQ

Q: Why is Klook in the schema but returns 0 rows? A: Klook gates every meaningful endpoint behind DataDome JS challenges that curl-cffi cannot clear without a full browser. Adding Klook would require Camoufox, which costs ~10x the compute of HTTP scraping. We kept the platform literal in the schema so v2 can land without breaking the dataset shape — but for v1, every Klook call returns [] with a WARNING. Use platforms: ["viator", "getyourguide"] to skip the wasted call entirely.

Q: Why default useProxy: true? A: GetYourGuide is fronted by Cloudflare and occasionally throttles datacenter IPs. The default-on posture trades a tiny bit of latency for materially higher first-page success rate. If you are running on a clean residential network, you can set it to false.

Q: How is duration_hours parsed? A: Viator and GetYourGuide write durations in a handful of formats: "3 hours", "1 hour", "30 minutes", "5 to 9 hours", "1 day", "2.5 hours". We parse all of them. For ranges, we use the midpoint ("5 to 9 hours" -> 7.0). Anything unparseable stays null rather than crashing the row.

Q: Will the Actor re-query the same activity across runs? A: Yes — each run is independent. To track an activity over time, schedule periodic runs and write to a named dataset (Actor.open_dataset(name=...)) or export to your warehouse. The Apify default dataset retention is 7 days.

Q: Why no detail-page scraping? A: Each detail page is a heavier scrape (cancellation policy, photos, availability), and many are Cloudflare-protected. v1 ships the breadth-first surface-level price intel that 80% of buyers actually need; detail-page scraping is on the v2 roadmap.

Q: Can I batch multiple cities in one run? A: Not in v1 — one locationQuery per run. To batch, schedule one Actor task per city (Apify supports unlimited parallel tasks on the free tier up to the concurrent-run cap). Each result row carries location_query so downstream pivots stay correct.

💬 Your feedback

Found a parser glitch or want a platform added? Open an issue on the Actor's Apify Store page. Pull requests welcome via the source code repository linked in the Actor's metadata.