Viator & GetYourGuide Tours Scraper avatar

Viator & GetYourGuide Tours Scraper

Pricing

Pay per event

Go to Apify Store
Viator & GetYourGuide Tours Scraper

Viator & GetYourGuide Tours Scraper

Scrape and unify tour and activity prices from Viator and GetYourGuide into one normalized schema β€” prices, duration, ratings, review counts, booking URLs per activity β€” export to JSON or CSV. A Viator / GetYourGuide API alternative for tour operators and OTA analysts.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

2 days ago

Last modified

Share

Viator and GetYourGuide Scraper

Viator & GetYourGuide Scraper

We do the dirty work so your dataset stays clean. 😈

$3.05 / 1,000 activity rows β€” pay only for results that land. No credit card to try.

Viator and GetYourGuide list overlapping inventory with different SKUs and prices. Running two single-source scrapers and writing your own normalization layer costs 1–2 dev-weeks and still leaves you with mismatched schemas. This Actor hits both platforms in one run, absorbs the blocks and retries, and emits one flat ResultRow per activity β€” platform, price, rating, duration, booking URL β€” straight into your Apify dataset.

One run. One schema. Ready for your spreadsheet, BI tool, or warehouse.

🎯 What this scrapes

Two major tours-and-activities platforms, unified into one Pydantic-validated schema:

  1. Viator β€” viator.com/searchResults/all?text=<query> (server-rendered HTML, 24 cards per page, data-automation test attributes)
  2. GetYourGuide β€” getyourguide.com/s/?q=<query> (Vue.js shell with SSR card content, 24 cards per page)
  3. Klook (v2 upgrade β€” currently returns 0 rows) β€” klook.com/search/?keyword=<query> is gated by a JS challenge that requires full browser execution. Documented as a future upgrade behind Camoufox; v1 returns [] with a WARNING.

Output rows carry every field needed for cross-platform comparison:

FieldTypeDescription
platformstringPlatform literal (viator, getyourguide, klook)
activity_idstringPlatform-canonical activity ID
activity_titlestringCard title text
location_querystringEcho of the user input β€” useful when batching cities
location_citystring | nullBest-effort city parsed from URL or query
location_countrystring | nullBest-effort country
price_usdnumber | nullUSD price when the platform itself displays USD
currency_originalstring | nullISO 4217 code parsed from the price symbol
price_originalnumber | nullNumeric price in the displayed currency
duration_hoursnumber | nullActivity duration in hours; midpoint for ranges
ratingnumber | nullStar rating, 0–5 scale
review_countinteger | nullNumber of reviews
operator_namestring | nullTour operator/supplier when surfaced
categorystring | nullCard tag (tour, experience, ...)
booking_urlstringAbsolute URL to the activity page
image_urlstring | nullAbsolute URL to the primary thumbnail
scraped_atstringISO 8601 UTC timestamp

πŸ”₯ Features

  • Two platforms, one schema β€” drop the dataset straight into a spreadsheet or BI tool; no per-platform normalization required.
  • We rotate browser fingerprints β€” curl-cffi impersonates Chrome 131 / Chrome 124 / Firefox 147 at the TLS+HTTP/2 layer, so both platforms see real-browser traffic, not Python.
  • We retry with exponential backoff β€” 408 / 429 / 503 responses trigger up to 5 attempts with doubling delays; Retry-After headers are honoured.
  • We rotate residential proxies β€” BUYPROXIES94952 routing is on by default; a fresh session_id and fresh exit IP are issued on every block.
  • Per-platform isolation β€” one platform's failure does not abort the run; surviving platforms still produce data.
  • Currency-aware β€” symbol-to-ISO mapping (€/$/Β£/Β₯ β†’ EUR/USD/GBP/JPY); price_usd is populated only when the platform itself displays USD.
  • Duration parser handles ranges β€” "5 to 9 hours" β†’ 7.0 (midpoint); "30 minutes" β†’ 0.5; "1 day" β†’ 24.0.
  • Pydantic v2 validation β€” input and every output row are model-validated; invalid input fails fast with a clear error before any network call.
  • Clean dataset rows β€” ISO-8601 timestamps, stable platform IDs, no half-parsed strings.
  • Configurable cap β€” maxPerPlatform lets you cap each platform at 1–100 rows per run.

πŸ’‘ Use cases

  • Tour operator competitive intelligence β€” find every activity your competitors list in your destination, compare prices, ratings, and durations side-by-side.
  • OTA cross-platform analyst dashboards β€” feed a BI tool with snapshots of how Viator and GetYourGuide each rank the same destination.
  • Dynamic pricing strategy β€” track how the same activity type is priced on each platform over time and adjust your own listings accordingly.
  • Destination intelligence reports β€” schedule weekly runs for "Paris" or "Tokyo" into a named dataset and chart price drift.
  • Travel-blogger affiliate research β€” surface high-rating, high-review-count activities for destination guides without manual browsing.
  • Inbound-tour-builder market research β€” discover which experiences dominate the first-page results when entering a new destination.
  • Travel-tech investor diligence β€” benchmark the top-of-funnel pricing across the experience-booking layer of the travel stack.

βš™οΈ How to use it

  1. Open the Actor input form.
  2. Type a Destination (Paris, New York, Tokyo, Bali, …).
  3. (Optional) Pick Platforms β€” leave empty to scrape all three, or list a subset like ["viator"] or ["getyourguide"]. Klook returns 0 rows in v1 (documented limitation).
  4. (Optional) Set Max rows per platform β€” default 20, max 100.
  5. Leave Use Apify Proxy on (default) for cleaner exit IPs when platforms throttle datacenter traffic.
  6. Click Start. Results stream into the default dataset.

Quick examples

Both supported platforms, default cap:

{
"locationQuery": "Paris"
}

GetYourGuide only, 5 rows (fastest path to confirm output shape):

{
"locationQuery": "Paris",
"platforms": ["getyourguide"],
"maxPerPlatform": 5,
"useProxy": false
}

Viator only, 50 rows:

{
"locationQuery": "Tokyo",
"platforms": ["viator"],
"maxPerPlatform": 50,
"useProxy": false
}

πŸ“₯ Input

JSON keyTypeDefaultDescription
locationQuerystring(required)Destination text query (e.g. "Paris")
platformsarray of literal[] (= all 3)Subset of viator / getyourguide / klook
maxPerPlatforminteger20Cap on rows per platform (1–100)
useProxybooleantrueRoute via Apify Proxy BUYPROXIES94952

locationQuery is the only required field. Whitespace is stripped; blank values are rejected up-front by Pydantic before any network call is made.

πŸ“€ Output

One row per activity. See the What this scrapes table above for the full schema.

{
"platform": "getyourguide",
"activity_id": "508441",
"activity_title": "Paris: Le Marais Guided Food Tour with Tastings",
"location_query": "Paris",
"location_city": "Paris",
"location_country": null,
"price_usd": null,
"currency_original": "EUR",
"price_original": 69.0,
"duration_hours": 3.0,
"rating": 4.9,
"review_count": 506,
"operator_name": null,
"category": "experience",
"booking_url": "https://www.getyourguide.com/paris-l16/no-diet-club-unique-local-food-tour-in-paris-le-marais-t508441/",
"image_url": "https://cdn.getyourguide.com/image/.../tour_img/7b9edf635985a601.jpeg",
"scraped_at": "2026-05-16T22:00:00.000Z"
}

πŸ’° Pricing

Pay-Per-Event (PPE) β€” you pay only for results that land:

EventRateTrigger
actor-start$0.05Once per run at Actor boot
result-row$0.003Per activity row emitted

Typical run cost (default maxPerPlatform=20, 2 working platforms, ~40 rows): ~$0.17. Per 1,000 rows extrapolated: ~$3.05.

No results β†’ no charge beyond the $0.05 start event. No subscription, no seat fee.

🚧 Limitations

  • Klook returns 0 rows in v1 β€” the search endpoint is gated by a JS challenge that curl-cffi cannot clear without a full browser. We document this up front and ship without it rather than over-promise; v2 will add a Camoufox path behind a feature flag.
  • First page only β€” no pagination across multiple result pages. Each platform returns ~20–24 cards on the first page; the default cap is 20.
  • No detail-page scraping β€” we scrape the search-results surface only. Itineraries, photo galleries, availability calendars, and meeting points are out of scope for v1.
  • Currency follows the platform's display β€” price_usd is populated only when the platform itself displays USD. We do not run our own FX conversion.
  • Search relevance is the platform's β€” "Paris" can include cards for Versailles or nearby destinations, depending on each platform's relevance engine.
  • Apify free-tier residential proxy is limited β€” BUYPROXIES94952 is the only proxy group provisioned on this account; works for our scale.

❓ FAQ

Q: Does Viator or GetYourGuide offer an official API I can use instead? A: Viator and GetYourGuide do publish partner APIs, but they require approved partner status, commercial agreements, and ongoing approval processes that most independent developers and analysts cannot access. This Actor scrapes the public search-results pages β€” no partner relationship needed. The output schema is compatible with what a partner API would return for the same fields.

Q: Why is Klook in the schema but returns 0 rows? A: Klook gates every meaningful endpoint behind a JS challenge that curl-cffi cannot clear without a full browser. Adding Klook v2 requires Camoufox, which costs roughly 10Γ— the compute of HTTP scraping. We kept the platform literal in the schema so v2 can land without breaking the dataset shape β€” but for v1, every Klook call returns [] with a WARNING. Use platforms: ["viator", "getyourguide"] to skip the wasted call entirely.

Q: How is duration_hours parsed? A: Viator and GetYourGuide write durations in several formats: "3 hours", "1 hour", "30 minutes", "5 to 9 hours", "1 day", "2.5 hours". We parse all of them. For ranges, we use the midpoint ("5 to 9 hours" β†’ 7.0). Anything unparseable stays null rather than crashing the row.

Q: Can I track an activity's price across multiple runs? A: Yes β€” each run is independent. To track an activity over time, schedule periodic runs and write to a named dataset (Actor.open_dataset(name=...)) or export to your warehouse. The Apify default dataset retention is 7 days; a named dataset persists until you delete it.

Q: Can I batch multiple cities in one run? A: Not in v1 β€” one locationQuery per run. To batch, schedule one Actor task per city (Apify supports unlimited parallel tasks on the free tier up to the concurrent-run cap). Each result row carries location_query so downstream pivots stay correct.

Q: Why default useProxy: true? A: Both platforms run behind edge protection and occasionally throttle datacenter IP ranges. The default-on posture trades a small latency overhead for materially higher first-page success rates. If you are running from a clean residential network, you can set it to false.

Q: Why no detail-page scraping? A: Each detail page is a heavier scrape (cancellation policy, photos, availability) and is behind additional edge protection. v1 ships the breadth-first surface-level price intel that 80% of buyers actually need; detail-page scraping is on the v2 roadmap.

πŸ’¬ Your feedback

Found a parser glitch, a missing platform, or a field that's broken? Open an issue on the Actor's Apify Store page. We read every report β€” the QA fixture for Paris keeps regressions locked, but real-world destination edge cases always surface new patterns.