Viator & GetYourGuide Tours Scraper
Pricing
Pay per event
Viator & GetYourGuide Tours Scraper
Scrape and unify tour and activity prices from Viator and GetYourGuide into one normalized schema β prices, duration, ratings, review counts, booking URLs per activity β export to JSON or CSV. A Viator / GetYourGuide API alternative for tour operators and OTA analysts.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
0
Monthly active users
2 days ago
Last modified
Categories
Share
Viator & GetYourGuide Scraper
We do the dirty work so your dataset stays clean. π
$3.05 / 1,000 activity rows β pay only for results that land. No credit card to try.
Viator and GetYourGuide list overlapping inventory with different SKUs and prices. Running two single-source scrapers and writing your own normalization layer costs 1β2 dev-weeks and still leaves you with mismatched schemas. This Actor hits both platforms in one run, absorbs the blocks and retries, and emits one flat ResultRow per activity β platform, price, rating, duration, booking URL β straight into your Apify dataset.
One run. One schema. Ready for your spreadsheet, BI tool, or warehouse.
π― What this scrapes
Two major tours-and-activities platforms, unified into one Pydantic-validated schema:
- Viator β
viator.com/searchResults/all?text=<query>(server-rendered HTML, 24 cards per page,data-automationtest attributes) - GetYourGuide β
getyourguide.com/s/?q=<query>(Vue.js shell with SSR card content, 24 cards per page) - Klook (v2 upgrade β currently returns 0 rows) β
klook.com/search/?keyword=<query>is gated by a JS challenge that requires full browser execution. Documented as a future upgrade behind Camoufox; v1 returns[]with a WARNING.
Output rows carry every field needed for cross-platform comparison:
| Field | Type | Description |
|---|---|---|
platform | string | Platform literal (viator, getyourguide, klook) |
activity_id | string | Platform-canonical activity ID |
activity_title | string | Card title text |
location_query | string | Echo of the user input β useful when batching cities |
location_city | string | null | Best-effort city parsed from URL or query |
location_country | string | null | Best-effort country |
price_usd | number | null | USD price when the platform itself displays USD |
currency_original | string | null | ISO 4217 code parsed from the price symbol |
price_original | number | null | Numeric price in the displayed currency |
duration_hours | number | null | Activity duration in hours; midpoint for ranges |
rating | number | null | Star rating, 0β5 scale |
review_count | integer | null | Number of reviews |
operator_name | string | null | Tour operator/supplier when surfaced |
category | string | null | Card tag (tour, experience, ...) |
booking_url | string | Absolute URL to the activity page |
image_url | string | null | Absolute URL to the primary thumbnail |
scraped_at | string | ISO 8601 UTC timestamp |
π₯ Features
- Two platforms, one schema β drop the dataset straight into a spreadsheet or BI tool; no per-platform normalization required.
- We rotate browser fingerprints β
curl-cffiimpersonates Chrome 131 / Chrome 124 / Firefox 147 at the TLS+HTTP/2 layer, so both platforms see real-browser traffic, not Python. - We retry with exponential backoff β
408 / 429 / 503responses trigger up to 5 attempts with doubling delays;Retry-Afterheaders are honoured. - We rotate residential proxies β
BUYPROXIES94952routing is on by default; a freshsession_idand fresh exit IP are issued on every block. - Per-platform isolation β one platform's failure does not abort the run; surviving platforms still produce data.
- Currency-aware β symbol-to-ISO mapping (β¬/$/Β£/Β₯ β EUR/USD/GBP/JPY);
price_usdis populated only when the platform itself displays USD. - Duration parser handles ranges β
"5 to 9 hours"β7.0(midpoint);"30 minutes"β0.5;"1 day"β24.0. - Pydantic v2 validation β input and every output row are model-validated; invalid input fails fast with a clear error before any network call.
- Clean dataset rows β ISO-8601 timestamps, stable platform IDs, no half-parsed strings.
- Configurable cap β
maxPerPlatformlets you cap each platform at 1β100 rows per run.
π‘ Use cases
- Tour operator competitive intelligence β find every activity your competitors list in your destination, compare prices, ratings, and durations side-by-side.
- OTA cross-platform analyst dashboards β feed a BI tool with snapshots of how Viator and GetYourGuide each rank the same destination.
- Dynamic pricing strategy β track how the same activity type is priced on each platform over time and adjust your own listings accordingly.
- Destination intelligence reports β schedule weekly runs for "Paris" or "Tokyo" into a named dataset and chart price drift.
- Travel-blogger affiliate research β surface high-rating, high-review-count activities for destination guides without manual browsing.
- Inbound-tour-builder market research β discover which experiences dominate the first-page results when entering a new destination.
- Travel-tech investor diligence β benchmark the top-of-funnel pricing across the experience-booking layer of the travel stack.
βοΈ How to use it
- Open the Actor input form.
- Type a Destination (
Paris,New York,Tokyo,Bali, β¦). - (Optional) Pick Platforms β leave empty to scrape all three, or list a subset like
["viator"]or["getyourguide"]. Klook returns 0 rows in v1 (documented limitation). - (Optional) Set Max rows per platform β default 20, max 100.
- Leave Use Apify Proxy on (default) for cleaner exit IPs when platforms throttle datacenter traffic.
- Click Start. Results stream into the default dataset.
Quick examples
Both supported platforms, default cap:
{"locationQuery": "Paris"}
GetYourGuide only, 5 rows (fastest path to confirm output shape):
{"locationQuery": "Paris","platforms": ["getyourguide"],"maxPerPlatform": 5,"useProxy": false}
Viator only, 50 rows:
{"locationQuery": "Tokyo","platforms": ["viator"],"maxPerPlatform": 50,"useProxy": false}
π₯ Input
| JSON key | Type | Default | Description |
|---|---|---|---|
locationQuery | string | (required) | Destination text query (e.g. "Paris") |
platforms | array of literal | [] (= all 3) | Subset of viator / getyourguide / klook |
maxPerPlatform | integer | 20 | Cap on rows per platform (1β100) |
useProxy | boolean | true | Route via Apify Proxy BUYPROXIES94952 |
locationQuery is the only required field. Whitespace is stripped; blank values are rejected up-front by Pydantic before any network call is made.
π€ Output
One row per activity. See the What this scrapes table above for the full schema.
{"platform": "getyourguide","activity_id": "508441","activity_title": "Paris: Le Marais Guided Food Tour with Tastings","location_query": "Paris","location_city": "Paris","location_country": null,"price_usd": null,"currency_original": "EUR","price_original": 69.0,"duration_hours": 3.0,"rating": 4.9,"review_count": 506,"operator_name": null,"category": "experience","booking_url": "https://www.getyourguide.com/paris-l16/no-diet-club-unique-local-food-tour-in-paris-le-marais-t508441/","image_url": "https://cdn.getyourguide.com/image/.../tour_img/7b9edf635985a601.jpeg","scraped_at": "2026-05-16T22:00:00.000Z"}
π° Pricing
Pay-Per-Event (PPE) β you pay only for results that land:
| Event | Rate | Trigger |
|---|---|---|
actor-start | $0.05 | Once per run at Actor boot |
result-row | $0.003 | Per activity row emitted |
Typical run cost (default maxPerPlatform=20, 2 working platforms, ~40 rows): ~$0.17.
Per 1,000 rows extrapolated: ~$3.05.
No results β no charge beyond the $0.05 start event. No subscription, no seat fee.
π§ Limitations
- Klook returns 0 rows in v1 β the search endpoint is gated by a JS challenge that
curl-cfficannot clear without a full browser. We document this up front and ship without it rather than over-promise; v2 will add a Camoufox path behind a feature flag. - First page only β no pagination across multiple result pages. Each platform returns ~20β24 cards on the first page; the default cap is 20.
- No detail-page scraping β we scrape the search-results surface only. Itineraries, photo galleries, availability calendars, and meeting points are out of scope for v1.
- Currency follows the platform's display β
price_usdis populated only when the platform itself displays USD. We do not run our own FX conversion. - Search relevance is the platform's β "Paris" can include cards for Versailles or nearby destinations, depending on each platform's relevance engine.
- Apify free-tier residential proxy is limited β
BUYPROXIES94952is the only proxy group provisioned on this account; works for our scale.
β FAQ
Q: Does Viator or GetYourGuide offer an official API I can use instead? A: Viator and GetYourGuide do publish partner APIs, but they require approved partner status, commercial agreements, and ongoing approval processes that most independent developers and analysts cannot access. This Actor scrapes the public search-results pages β no partner relationship needed. The output schema is compatible with what a partner API would return for the same fields.
Q: Why is Klook in the schema but returns 0 rows?
A: Klook gates every meaningful endpoint behind a JS challenge that curl-cffi cannot clear without a full browser. Adding Klook v2 requires Camoufox, which costs roughly 10Γ the compute of HTTP scraping. We kept the platform literal in the schema so v2 can land without breaking the dataset shape β but for v1, every Klook call returns [] with a WARNING. Use platforms: ["viator", "getyourguide"] to skip the wasted call entirely.
Q: How is duration_hours parsed?
A: Viator and GetYourGuide write durations in several formats: "3 hours", "1 hour", "30 minutes", "5 to 9 hours", "1 day", "2.5 hours". We parse all of them. For ranges, we use the midpoint ("5 to 9 hours" β 7.0). Anything unparseable stays null rather than crashing the row.
Q: Can I track an activity's price across multiple runs?
A: Yes β each run is independent. To track an activity over time, schedule periodic runs and write to a named dataset (Actor.open_dataset(name=...)) or export to your warehouse. The Apify default dataset retention is 7 days; a named dataset persists until you delete it.
Q: Can I batch multiple cities in one run?
A: Not in v1 β one locationQuery per run. To batch, schedule one Actor task per city (Apify supports unlimited parallel tasks on the free tier up to the concurrent-run cap). Each result row carries location_query so downstream pivots stay correct.
Q: Why default useProxy: true?
A: Both platforms run behind edge protection and occasionally throttle datacenter IP ranges. The default-on posture trades a small latency overhead for materially higher first-page success rates. If you are running from a clean residential network, you can set it to false.
Q: Why no detail-page scraping? A: Each detail page is a heavier scrape (cancellation policy, photos, availability) and is behind additional edge protection. v1 ships the breadth-first surface-level price intel that 80% of buyers actually need; detail-page scraping is on the v2 roadmap.
π¬ Your feedback
Found a parser glitch, a missing platform, or a field that's broken? Open an issue on the Actor's Apify Store page. We read every report β the QA fixture for Paris keeps regressions locked, but real-world destination edge cases always surface new patterns.