Terratur Flight Search Scraper
Pricing
Pay per usage
Terratur Flight Search Scraper
Scrapes flight offers from terratur.tur.br (Terratur, Brazilian travel agency on OnerTravel/Befly). Inputs origin/destination (IATA or city), dates and passengers; returns airline, price in BRL, times, stops, baggage and segment-by-segment data as JSON. Handles one-way and round-trip.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Bruno Chiaramonti
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a month ago
Last modified
Categories
Share
An Apify Actor that searches flight tickets on terratur.tur.br and writes the offers (price, airline, segments, baggage, etc.) to an Apify Dataset.
terratur.tur.br is a WordPress site that embeds the OnerTravel / Befly white-label flight widget. When a visitor submits the form, the widget redirects to the tenant booking page at https://www.comprarviagem.com.br/terratur/flight-list?…. That page is an Angular SPA — the actual flight search is asynchronous: an HTTP POST kicks off a Lambda job, and the results stream in via a WebSocket (wss://event.onertravel.com/production). Re-implementing the protocol from scratch is fragile, so this actor does the pragmatic thing: it loads the tenant page in a headless Chromium with PlaywrightCrawler and intercepts the /api/flight/v1/search/{outbound,inbound} XHR responses as the browser receives them.
Input
| Field | Type | Default | Notes |
|---|---|---|---|
originQuery | string | "Fortaleza" | City name ("Fortaleza") or IATA code ("FOR"). |
destinationQuery | string | "Sao Paulo" | City name or IATA code. |
departureDate | string (YYYY-MM-DD) | — (required) | Outbound date. |
returnDate | string (YYYY-MM-DD) | — | Leave empty for one-way trips. |
adults | integer | 1 | 12+ years. |
children | integer | 0 | 2–11 years. |
infants | integer | 0 | < 2 years. |
maxResults | integer | 100 | Cap on stored offers (outbound + inbound). |
waitForResultsSeconds | integer | 60 | Per-direction settle window. The actor returns earlier when the WS sends ENDED or the count stabilises for ~10 s. |
includeInbound | boolean | true | On round trips, replay the inbound XHR (using the captured searchKey + first outbound flightKey) to also collect return-leg offers. |
tenantUrl | string | https://www.comprarviagem.com.br/terratur | Booking-side host. Terratur is an agent of Comprar Viagem under the OnerTravel platform; change this only if Terratur migrates tenants. |
proxyConfiguration | proxy | — | Optional Apify Proxy. Leave empty when running locally without credentials. |
Output
Each Dataset item is one offer:
{"direction": "outbound","key": "22d9e9f3-…","airline": "GOL LINHAS AEREAS","airlineIata": "G3","flightNumbers": ["1991", "1635"],"origin": "FOR","originCity": "Fortaleza","destination": "CGH","destinationCity": "São Paulo","departure": "2026-08-15T11:55","arrival": "2026-08-15T17:10","durationMinutes": 315,"stops": 1,"cabinClass": "Econômica","fareFamily": "LIGHT","allowedBaggage": false,"baggageAllowance": [ { "type": 1, "unitDescription": "KG", "quantity": 1, "weight": 10 } ],"price": 905.49,"priceBase": 840.11,"priceTax": 65.38,"currency": "BRL","segments": [ /* per-leg breakdown */ ],"raw": { /* full OnerTravel response for this offer */ }}
The full raw object is retained under raw so downstream consumers can pick fields the normaliser may have missed.
How it works
- Resolve airports. Hits
https://api.onertravel.com/api/airport/search?name=…&isDeparture=…to turn city names into IATA codes. If you already pass IATA codes the lookup is skipped. - Open the tenant flight-list page. Builds the canonical URL the Befly widget would redirect to and loads it with
PlaywrightCrawler. Cookies, headers and CORS context are inherited from the page so Lambda requests look exactly like the widget's. - Listen to the page. Two listeners run for the lifetime of the page load:
page.on('response', …)captures every/api/flight/v1/search/outboundand/inboundXHR and parses the flight array.page.on('websocket', …)watcheswss://event.onertravel.com/production; theENDEDframe is our signal that the outbound search is done.
- Wait for outbound to settle. Returns early on
ENDEDor when the count stays stable for ~10 seconds. - Replay inbound (round trips only). Uses
page.request.post(…/inbound)with{searchKey, flightKey: firstOutbound.key, page, pageSize, filter}— the inbound Lambda needs both keys, see theBeDTO in the OnerTravel bundle. - Normalise + push. Each flight is flattened to a query-friendly shape (ISO timestamps, IATA codes, minutes), deduplicated by
key, thenActor.pushData()'d.
The actor also persists two debug values in the default key-value store:
SAMPLE_RESPONSE— full first/outboundpayload (for schema debugging).SAMPLE_REQUEST— the request body the widget sent (so you can see thesearchKeyit used).FLIGHT_LIST_HTML— only written when no flights were captured, to help diagnose layout drift.
Running locally
npm installnpx playwright install chromiummkdir -p storage/key_value_stores/defaultcat > storage/key_value_stores/default/INPUT.json <<'JSON'{"originQuery": "Fortaleza","destinationQuery": "Sao Paulo","departureDate": "2026-08-15","maxResults": 20}JSONCRAWLEE_HEADLESS=1 APIFY_LOCAL_STORAGE_DIR=./storage npm start
Results land in storage/datasets/default/.
Running on Apify
apify loginapify push
The Docker image is based on apify/actor-node-playwright-chrome:24 so Chromium + Playwright are preinstalled.
Tests
$npm test
The unit tests cover the pure helpers (isIata, buildFlightListUrl). The browser-driven path is intentionally not run in CI — it depends on the live OnerTravel backend and takes ~25 s per scenario.
When this might break
- OnerTravel renames
/api/flight/v1/search/{outbound,inbound}or changes the request DTO. Inspectwidget-befly.js— the path strings andapiBaseUrlFlightare unminified inside. - Terratur migrates to a different OnerTravel institution/agent. The widget pulls
agencyPathfromhttps://api.onertravel.com/api/institutionWidgetConfiguration(Origin: https://terratur.tur.br). OverridetenantUrlin the input. - The WebSocket protocol changes the
ENDEDframe format. Theframereceivedhandler does a substring match — adjust to taste.