Terratur Flight Search Scraper avatar

Terratur Flight Search Scraper

Pricing

Pay per usage

Go to Apify Store
Terratur Flight Search Scraper

Terratur Flight Search Scraper

Scrapes flight offers from terratur.tur.br (Terratur, Brazilian travel agency on OnerTravel/Befly). Inputs origin/destination (IATA or city), dates and passengers; returns airline, price in BRL, times, stops, baggage and segment-by-segment data as JSON. Handles one-way and round-trip.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Bruno Chiaramonti

Bruno Chiaramonti

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a month ago

Last modified

Share

An Apify Actor that searches flight tickets on terratur.tur.br and writes the offers (price, airline, segments, baggage, etc.) to an Apify Dataset.

terratur.tur.br is a WordPress site that embeds the OnerTravel / Befly white-label flight widget. When a visitor submits the form, the widget redirects to the tenant booking page at https://www.comprarviagem.com.br/terratur/flight-list?…. That page is an Angular SPA — the actual flight search is asynchronous: an HTTP POST kicks off a Lambda job, and the results stream in via a WebSocket (wss://event.onertravel.com/production). Re-implementing the protocol from scratch is fragile, so this actor does the pragmatic thing: it loads the tenant page in a headless Chromium with PlaywrightCrawler and intercepts the /api/flight/v1/search/{outbound,inbound} XHR responses as the browser receives them.

Input

FieldTypeDefaultNotes
originQuerystring"Fortaleza"City name ("Fortaleza") or IATA code ("FOR").
destinationQuerystring"Sao Paulo"City name or IATA code.
departureDatestring (YYYY-MM-DD)(required)Outbound date.
returnDatestring (YYYY-MM-DD)Leave empty for one-way trips.
adultsinteger112+ years.
childreninteger02–11 years.
infantsinteger0< 2 years.
maxResultsinteger100Cap on stored offers (outbound + inbound).
waitForResultsSecondsinteger60Per-direction settle window. The actor returns earlier when the WS sends ENDED or the count stabilises for ~10 s.
includeInboundbooleantrueOn round trips, replay the inbound XHR (using the captured searchKey + first outbound flightKey) to also collect return-leg offers.
tenantUrlstringhttps://www.comprarviagem.com.br/terraturBooking-side host. Terratur is an agent of Comprar Viagem under the OnerTravel platform; change this only if Terratur migrates tenants.
proxyConfigurationproxyOptional Apify Proxy. Leave empty when running locally without credentials.

Output

Each Dataset item is one offer:

{
"direction": "outbound",
"key": "22d9e9f3-…",
"airline": "GOL LINHAS AEREAS",
"airlineIata": "G3",
"flightNumbers": ["1991", "1635"],
"origin": "FOR",
"originCity": "Fortaleza",
"destination": "CGH",
"destinationCity": "São Paulo",
"departure": "2026-08-15T11:55",
"arrival": "2026-08-15T17:10",
"durationMinutes": 315,
"stops": 1,
"cabinClass": "Econômica",
"fareFamily": "LIGHT",
"allowedBaggage": false,
"baggageAllowance": [ { "type": 1, "unitDescription": "KG", "quantity": 1, "weight": 10 } ],
"price": 905.49,
"priceBase": 840.11,
"priceTax": 65.38,
"currency": "BRL",
"segments": [ /* per-leg breakdown */ ],
"raw": { /* full OnerTravel response for this offer */ }
}

The full raw object is retained under raw so downstream consumers can pick fields the normaliser may have missed.

How it works

  1. Resolve airports. Hits https://api.onertravel.com/api/airport/search?name=…&isDeparture=… to turn city names into IATA codes. If you already pass IATA codes the lookup is skipped.
  2. Open the tenant flight-list page. Builds the canonical URL the Befly widget would redirect to and loads it with PlaywrightCrawler. Cookies, headers and CORS context are inherited from the page so Lambda requests look exactly like the widget's.
  3. Listen to the page. Two listeners run for the lifetime of the page load:
    • page.on('response', …) captures every /api/flight/v1/search/outbound and /inbound XHR and parses the flight array.
    • page.on('websocket', …) watches wss://event.onertravel.com/production; the ENDED frame is our signal that the outbound search is done.
  4. Wait for outbound to settle. Returns early on ENDED or when the count stays stable for ~10 seconds.
  5. Replay inbound (round trips only). Uses page.request.post(…/inbound) with {searchKey, flightKey: firstOutbound.key, page, pageSize, filter} — the inbound Lambda needs both keys, see the Be DTO in the OnerTravel bundle.
  6. Normalise + push. Each flight is flattened to a query-friendly shape (ISO timestamps, IATA codes, minutes), deduplicated by key, then Actor.pushData()'d.

The actor also persists two debug values in the default key-value store:

  • SAMPLE_RESPONSE — full first /outbound payload (for schema debugging).
  • SAMPLE_REQUEST — the request body the widget sent (so you can see the searchKey it used).
  • FLIGHT_LIST_HTML — only written when no flights were captured, to help diagnose layout drift.

Running locally

npm install
npx playwright install chromium
mkdir -p storage/key_value_stores/default
cat > storage/key_value_stores/default/INPUT.json <<'JSON'
{
"originQuery": "Fortaleza",
"destinationQuery": "Sao Paulo",
"departureDate": "2026-08-15",
"maxResults": 20
}
JSON
CRAWLEE_HEADLESS=1 APIFY_LOCAL_STORAGE_DIR=./storage npm start

Results land in storage/datasets/default/.

Running on Apify

apify login
apify push

The Docker image is based on apify/actor-node-playwright-chrome:24 so Chromium + Playwright are preinstalled.

Tests

$npm test

The unit tests cover the pure helpers (isIata, buildFlightListUrl). The browser-driven path is intentionally not run in CI — it depends on the live OnerTravel backend and takes ~25 s per scenario.

When this might break

  • OnerTravel renames /api/flight/v1/search/{outbound,inbound} or changes the request DTO. Inspect widget-befly.js — the path strings and apiBaseUrlFlight are unminified inside.
  • Terratur migrates to a different OnerTravel institution/agent. The widget pulls agencyPath from https://api.onertravel.com/api/institutionWidgetConfiguration (Origin: https://terratur.tur.br). Override tenantUrl in the input.
  • The WebSocket protocol changes the ENDED frame format. The framereceived handler does a substring match — adjust to taste.