# Terratur Flight Search Scraper (`bchiaramonti/terratur-flight-search-scraper`) Actor

Scrapes flight offers from terratur.tur.br (Terratur, Brazilian travel agency on OnerTravel/Befly). Inputs origin/destination (IATA or city), dates and passengers; returns airline, price in BRL, times, stops, baggage and segment-by-segment data as JSON. Handles one-way and round-trip.

- **URL**: https://apify.com/bchiaramonti/terratur-flight-search-scraper.md
- **Developed by:** [Bruno Chiaramonti](https://apify.com/bchiaramonti) (community)
- **Categories:** Travel, E-commerce
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Terratur Flight Search Scraper

An Apify Actor that searches flight tickets on **[terratur.tur.br](https://terratur.tur.br/)** and writes the offers (price, airline, segments, baggage, etc.) to an Apify Dataset.

terratur.tur.br is a WordPress site that embeds the **OnerTravel / Befly** white-label flight widget. When a visitor submits the form, the widget redirects to the tenant booking page at `https://www.comprarviagem.com.br/terratur/flight-list?…`. That page is an Angular SPA — the actual flight search is asynchronous: an HTTP `POST` kicks off a Lambda job, and the results stream in via a WebSocket (`wss://event.onertravel.com/production`). Re-implementing the protocol from scratch is fragile, so this actor does the pragmatic thing: it loads the tenant page in a headless Chromium with `PlaywrightCrawler` and intercepts the `/api/flight/v1/search/{outbound,inbound}` XHR responses as the browser receives them.

### Input

| Field | Type | Default | Notes |
| --- | --- | --- | --- |
| `originQuery` | string | `"Fortaleza"` | City name (`"Fortaleza"`) or IATA code (`"FOR"`). |
| `destinationQuery` | string | `"Sao Paulo"` | City name or IATA code. |
| `departureDate` | string (`YYYY-MM-DD`) | — *(required)* | Outbound date. |
| `returnDate` | string (`YYYY-MM-DD`) | — | Leave empty for one-way trips. |
| `adults` | integer | `1` | 12+ years. |
| `children` | integer | `0` | 2–11 years. |
| `infants` | integer | `0` | < 2 years. |
| `maxResults` | integer | `100` | Cap on stored offers (outbound + inbound). |
| `waitForResultsSeconds` | integer | `60` | Per-direction settle window. The actor returns earlier when the WS sends `ENDED` or the count stabilises for ~10 s. |
| `includeInbound` | boolean | `true` | On round trips, replay the inbound XHR (using the captured `searchKey` + first outbound `flightKey`) to also collect return-leg offers. |
| `tenantUrl` | string | `https://www.comprarviagem.com.br/terratur` | Booking-side host. Terratur is an agent of *Comprar Viagem* under the OnerTravel platform; change this only if Terratur migrates tenants. |
| `proxyConfiguration` | proxy | — | Optional Apify Proxy. Leave empty when running locally without credentials. |

### Output

Each Dataset item is one offer:

```json
{
  "direction": "outbound",
  "key": "22d9e9f3-…",
  "airline": "GOL LINHAS AEREAS",
  "airlineIata": "G3",
  "flightNumbers": ["1991", "1635"],
  "origin": "FOR",
  "originCity": "Fortaleza",
  "destination": "CGH",
  "destinationCity": "São Paulo",
  "departure": "2026-08-15T11:55",
  "arrival":   "2026-08-15T17:10",
  "durationMinutes": 315,
  "stops": 1,
  "cabinClass": "Econômica",
  "fareFamily": "LIGHT",
  "allowedBaggage": false,
  "baggageAllowance": [ { "type": 1, "unitDescription": "KG", "quantity": 1, "weight": 10 } ],
  "price": 905.49,
  "priceBase": 840.11,
  "priceTax": 65.38,
  "currency": "BRL",
  "segments": [ /* per-leg breakdown */ ],
  "raw": { /* full OnerTravel response for this offer */ }
}
````

The full raw object is retained under `raw` so downstream consumers can pick fields the normaliser may have missed.

### How it works

1. **Resolve airports.** Hits `https://api.onertravel.com/api/airport/search?name=…&isDeparture=…` to turn city names into IATA codes. If you already pass IATA codes the lookup is skipped.
2. **Open the tenant flight-list page.** Builds the canonical URL the Befly widget would redirect to and loads it with `PlaywrightCrawler`. Cookies, headers and CORS context are inherited from the page so Lambda requests look exactly like the widget's.
3. **Listen to the page.** Two listeners run for the lifetime of the page load:
   - `page.on('response', …)` captures every `/api/flight/v1/search/outbound` and `/inbound` XHR and parses the flight array.
   - `page.on('websocket', …)` watches `wss://event.onertravel.com/production`; the `ENDED` frame is our signal that the outbound search is done.
4. **Wait for outbound to settle.** Returns early on `ENDED` or when the count stays stable for ~10 seconds.
5. **Replay inbound (round trips only).** Uses `page.request.post(…/inbound)` with `{searchKey, flightKey: firstOutbound.key, page, pageSize, filter}` — the inbound Lambda needs both keys, see the `Be` DTO in the OnerTravel bundle.
6. **Normalise + push.** Each flight is flattened to a query-friendly shape (ISO timestamps, IATA codes, minutes), deduplicated by `key`, then `Actor.pushData()`'d.

The actor also persists two debug values in the default key-value store:

- `SAMPLE_RESPONSE` — full first `/outbound` payload (for schema debugging).
- `SAMPLE_REQUEST` — the request body the widget sent (so you can see the `searchKey` it used).
- `FLIGHT_LIST_HTML` — only written when no flights were captured, to help diagnose layout drift.

### Running locally

```bash
npm install
npx playwright install chromium

mkdir -p storage/key_value_stores/default
cat > storage/key_value_stores/default/INPUT.json <<'JSON'
{
  "originQuery": "Fortaleza",
  "destinationQuery": "Sao Paulo",
  "departureDate": "2026-08-15",
  "maxResults": 20
}
JSON

CRAWLEE_HEADLESS=1 APIFY_LOCAL_STORAGE_DIR=./storage npm start
```

Results land in `storage/datasets/default/`.

### Running on Apify

```bash
apify login
apify push
```

The Docker image is based on `apify/actor-node-playwright-chrome:24` so Chromium + Playwright are preinstalled.

### Tests

```bash
npm test
```

The unit tests cover the pure helpers (`isIata`, `buildFlightListUrl`). The browser-driven path is intentionally not run in CI — it depends on the live OnerTravel backend and takes ~25 s per scenario.

### When this might break

- OnerTravel renames `/api/flight/v1/search/{outbound,inbound}` or changes the request DTO. Inspect [`widget-befly.js`](https://static.onertravel.com/widget/search/production/widget-befly.js) — the path strings and `apiBaseUrlFlight` are unminified inside.
- Terratur migrates to a different OnerTravel institution/agent. The widget pulls `agencyPath` from `https://api.onertravel.com/api/institutionWidgetConfiguration` (`Origin: https://terratur.tur.br`). Override `tenantUrl` in the input.
- The WebSocket protocol changes the `ENDED` frame format. The `framereceived` handler does a substring match — adjust to taste.

# Actor input Schema

## `originQuery` (type: `string`):

City name (e.g. 'Fortaleza') or IATA code (e.g. 'FOR').

## `destinationQuery` (type: `string`):

City name (e.g. 'Sao Paulo') or IATA code (e.g. 'GRU').

## `departureDate` (type: `string`):

Outbound flight date in YYYY-MM-DD format.

## `returnDate` (type: `string`):

Return flight date in YYYY-MM-DD format. Leave empty for one-way trips.

## `adults` (type: `integer`):

Number of adult passengers (12+ years).

## `children` (type: `integer`):

Number of children (2-11 years).

## `infants` (type: `integer`):

Number of infants (under 2 years).

## `maxResults` (type: `integer`):

Maximum number of flight offers to keep.

## `waitForResultsSeconds` (type: `integer`):

How long to wait per direction (outbound / inbound) for the OnerTravel WebSocket stream to settle. The actor stops earlier when results stabilise or the ENDED frame arrives.

## `includeInbound` (type: `boolean`):

When a returnDate is set, the actor selects the first outbound flight to unlock the inbound list and captures both legs. Disable to skip the click and only collect outbound options.

## `tenantUrl` (type: `string`):

Base URL of the OnerTravel tenant that powers terratur.tur.br. Default points at the Terratur tenant under Comprar Viagem.

## `proxyConfiguration` (type: `object`):

Optional Apify Proxy configuration to avoid IP-based rate limits. Leave empty when running locally without Apify credentials.

## Actor input object example

```json
{
  "originQuery": "Fortaleza",
  "destinationQuery": "Sao Paulo",
  "departureDate": "2026-08-15",
  "adults": 1,
  "children": 0,
  "infants": 0,
  "maxResults": 100,
  "waitForResultsSeconds": 60,
  "includeInbound": true,
  "tenantUrl": "https://www.comprarviagem.com.br/terratur"
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("bchiaramonti/terratur-flight-search-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("bchiaramonti/terratur-flight-search-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call bchiaramonti/terratur-flight-search-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=bchiaramonti/terratur-flight-search-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Terratur Flight Search Scraper",
        "description": "Scrapes flight offers from terratur.tur.br (Terratur, Brazilian travel agency on OnerTravel/Befly). Inputs origin/destination (IATA or city), dates and passengers; returns airline, price in BRL, times, stops, baggage and segment-by-segment data as JSON. Handles one-way and round-trip.",
        "version": "0.0",
        "x-build-id": "fE9OdZf8d5x8RRqlw"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/bchiaramonti~terratur-flight-search-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-bchiaramonti-terratur-flight-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/bchiaramonti~terratur-flight-search-scraper/runs": {
            "post": {
                "operationId": "runs-sync-bchiaramonti-terratur-flight-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/bchiaramonti~terratur-flight-search-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-bchiaramonti-terratur-flight-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "originQuery",
                    "destinationQuery",
                    "departureDate"
                ],
                "properties": {
                    "originQuery": {
                        "title": "Origin city or IATA code",
                        "type": "string",
                        "description": "City name (e.g. 'Fortaleza') or IATA code (e.g. 'FOR').",
                        "default": "Fortaleza"
                    },
                    "destinationQuery": {
                        "title": "Destination city or IATA code",
                        "type": "string",
                        "description": "City name (e.g. 'Sao Paulo') or IATA code (e.g. 'GRU').",
                        "default": "Sao Paulo"
                    },
                    "departureDate": {
                        "title": "Departure date (YYYY-MM-DD)",
                        "type": "string",
                        "description": "Outbound flight date in YYYY-MM-DD format.",
                        "default": "2026-08-15"
                    },
                    "returnDate": {
                        "title": "Return date (YYYY-MM-DD, optional)",
                        "type": "string",
                        "description": "Return flight date in YYYY-MM-DD format. Leave empty for one-way trips."
                    },
                    "adults": {
                        "title": "Adults",
                        "minimum": 1,
                        "maximum": 9,
                        "type": "integer",
                        "description": "Number of adult passengers (12+ years).",
                        "default": 1
                    },
                    "children": {
                        "title": "Children",
                        "minimum": 0,
                        "maximum": 8,
                        "type": "integer",
                        "description": "Number of children (2-11 years).",
                        "default": 0
                    },
                    "infants": {
                        "title": "Infants",
                        "minimum": 0,
                        "maximum": 4,
                        "type": "integer",
                        "description": "Number of infants (under 2 years).",
                        "default": 0
                    },
                    "maxResults": {
                        "title": "Max results",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Maximum number of flight offers to keep.",
                        "default": 100
                    },
                    "waitForResultsSeconds": {
                        "title": "Wait window (seconds)",
                        "minimum": 10,
                        "maximum": 180,
                        "type": "integer",
                        "description": "How long to wait per direction (outbound / inbound) for the OnerTravel WebSocket stream to settle. The actor stops earlier when results stabilise or the ENDED frame arrives.",
                        "default": 60
                    },
                    "includeInbound": {
                        "title": "Include inbound flights (round trips)",
                        "type": "boolean",
                        "description": "When a returnDate is set, the actor selects the first outbound flight to unlock the inbound list and captures both legs. Disable to skip the click and only collect outbound options.",
                        "default": true
                    },
                    "tenantUrl": {
                        "title": "Tenant booking URL",
                        "type": "string",
                        "description": "Base URL of the OnerTravel tenant that powers terratur.tur.br. Default points at the Terratur tenant under Comprar Viagem.",
                        "default": "https://www.comprarviagem.com.br/terratur"
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Optional Apify Proxy configuration to avoid IP-based rate limits. Leave empty when running locally without Apify credentials."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
