# Tatoeba Sentence Corpus Scraper (`parseforge/tatoeba-sentence-corpus-scraper`) Actor

Extract Tatoeba sentence corpus with millions of bilingual example sentences. Capture sentence ID, language, text, owner, audio URL, translations, tags, and license. Export to JSON, CSV, or Excel for language learning, NLP training data, translation memory, and linguistic research.

- **URL**: https://apify.com/parseforge/tatoeba-sentence-corpus-scraper.md
- **Developed by:** [ParseForge](https://apify.com/parseforge) (community)
- **Categories:** Education, Developer tools, Other
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $10.00 / 1,000 result items

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![ParseForge Banner](https://github.com/ParseForge/apify-assets/blob/ad35ccc13ddd068b9d6cba33f323962e39aed5b2/banner.jpg?raw=true)

## 🗣️ Tatoeba Sentence Corpus Scraper

> 🚀 **Export the world's largest open multilingual sentence corpus in seconds.** Pull **12,000,000+ example sentences** across **400+ languages** with translations, audio, contributor info, and CC-BY licence metadata. No login, no manual CSV stitching.

> 🕒 **Last updated:** 2026-05-23 · **📊 14 fields** per record · **🗣️ 12M+ sentences** · **🌍 400+ languages** · **🔊 Audio + translations**

The **Tatoeba Sentence Corpus Scraper** taps into the Tatoeba community catalog and returns **14 structured fields per sentence**, including the original text, language code and name, translation list, audio links, contributor handle, correctness score, and licence. Tatoeba has been collaboratively edited by linguists, polyglots, and language learners since 2006, and ships under a permissive Creative Commons licence.

The catalog covers **every major living language family plus dozens of constructed, classical, and minority languages**, from Mandarin and Spanish down to Latin, Esperanto, and revived regional tongues. This Actor turns that into a clean CSV, Excel, JSON, or XML dataset in under five minutes, with all filtering done server-side so you skip the parsing entirely.

| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Linguists, language-learning app builders, translation researchers, NLP engineers, lexicographers, ESL teachers, audio dataset curators | Parallel corpus mining, flashcard sourcing, translation memory seeding, speech model training, idiom and proverb research, classroom example banks |

---

### 📋 What the Tatoeba Scraper does

Four sentence-mining workflows in a single run:

- 🔎 **Keyword search.** Find every sentence containing a target word or phrase.
- 🌐 **Source language filter.** Pick a single source language out of 400+ (Tatoeba uses ISO 639-3 codes).
- ↔️ **Target translation filter.** Restrict to sentences that have a translation into a chosen language.
- 🏷️ **Tag filter.** Pull only sentences tagged with concepts like `proverb`, `idiom`, `greeting`, or any community label.

Each record includes the sentence ID, raw text, language code and human-readable name, every linked translation (with that translation's language), audio file URLs when available, contributor handle, correctness score, licence, and the canonical Tatoeba page link.

> 💡 **Why it matters:** clean, licence-clear parallel sentences are the raw material of every translation memory, language-learning flashcard, and speech model training set. Building your own pipeline against the Tatoeba site means writing fragile HTML parsers and respecting rate limits by hand. This Actor delivers the same data structured and ready to import.

---

### 🎬 Full Demo

_🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded sentence corpus._

---

### ⚙️ Input

<table>
<thead>
<tr><th>Input</th><th>Type</th><th>Default</th><th>Behavior</th></tr>
</thead>
<tbody>
<tr><td>maxItems</td><td>integer</td><td>10</td><td>Sentences to return. Free plan caps at 10, paid plan at 1,000,000.</td></tr>
<tr><td>query</td><td>string</td><td>"hello"</td><td>Keyword search. Empty browses the chosen language without a filter.</td></tr>
<tr><td>fromLanguage</td><td>string</td><td>"eng"</td><td>Source language (ISO 639-3). 50 most common languages exposed.</td></tr>
<tr><td>toLanguage</td><td>string</td><td>""</td><td>Target translation language (ISO 639-3). Empty returns all translations.</td></tr>
<tr><td>tags</td><td>array</td><td>[]</td><td>Filter by community tag names like proverb, idiom.</td></tr>
</tbody>
</table>

**Example: 50 English sentences containing "morning" with Spanish translations.**

```json
{
    "maxItems": 50,
    "query": "morning",
    "fromLanguage": "eng",
    "toLanguage": "spa"
}
````

**Example: 100 Japanese proverbs with translations into any language.**

```json
{
    "maxItems": 100,
    "query": "",
    "fromLanguage": "jpn",
    "tags": ["proverb"]
}
```

> ⚠️ **Good to Know:** Tatoeba is a community-edited corpus. Correctness scores reflect peer review, but expect occasional informal or regional phrasings. For production translation memory, weight by `correctness` and prefer sentences with multiple contributor confirmations.

***

### 📊 Output

Each sentence record contains **14 fields**. Download the dataset as CSV, Excel, JSON, or XML.

#### 🧾 Schema

| Field | Type | Example |
|---|---|---|
| 🆔 `sentenceId` | number | `1276` |
| 💬 `text` | string | `"Let's try something."` |
| 🌐 `language` | string | `"eng"` |
| 🗺️ `languageName` | string | `"English"` |
| ↔️ `direction` | string | `"source"` |
| ✅ `correctness` | number | `1` |
| 📜 `license` | string | `"CC BY 2.0 FR"` |
| 👤 `contributor` | string | `"CK"` |
| 🔊 `hasAudio` | boolean | `true` |
| 🎧 `audioUrls` | array | `["https://tatoeba.org/audio/download/1276"]` |
| 🔢 `translationCount` | number | `12` |
| 🌍 `translations` | array | `[{"id":1277,"text":"Probemos algo.","language":"spa"}]` |
| 🔗 `url` | string | `"https://tatoeba.org/eng/sentences/show/1276"` |
| 🕒 `scrapedAt` | ISO 8601 | `"2026-05-23T00:00:00.000Z"` |

#### 📦 Sample records

<details>
<summary><strong>🇬🇧 English sentence with Spanish translation</strong></summary>

```json
{
    "sentenceId": 1276,
    "text": "Let's try something.",
    "language": "eng",
    "languageName": "English",
    "direction": "source",
    "correctness": 1,
    "license": "CC BY 2.0 FR",
    "contributor": "CK",
    "hasAudio": true,
    "audioUrls": ["https://tatoeba.org/audio/download/1276"],
    "translationCount": 12,
    "translations": [
        {"id": 1277, "text": "Probemos algo.", "language": "spa"},
        {"id": 5045, "text": "Essayons quelque chose.", "language": "fra"}
    ],
    "url": "https://tatoeba.org/eng/sentences/show/1276",
    "scrapedAt": "2026-05-23T00:00:00.000Z"
}
```

</details>

<details>
<summary><strong>🇯🇵 Japanese proverb with multiple translations</strong></summary>

```json
{
    "sentenceId": 75321,
    "text": "猿も木から落ちる。",
    "language": "jpn",
    "languageName": "Japanese",
    "direction": "source",
    "correctness": 1,
    "license": "CC BY 2.0 FR",
    "contributor": "bunbuku",
    "hasAudio": false,
    "audioUrls": [],
    "translationCount": 8,
    "translations": [
        {"id": 75322, "text": "Even monkeys fall from trees.", "language": "eng"},
        {"id": 290118, "text": "Hasta los monos caen de los árboles.", "language": "spa"}
    ],
    "url": "https://tatoeba.org/eng/sentences/show/75321",
    "scrapedAt": "2026-05-23T00:00:00.000Z"
}
```

</details>

<details>
<summary><strong>🇪🇸 Spanish greeting tagged "greeting"</strong></summary>

```json
{
    "sentenceId": 386,
    "text": "¡Hola!",
    "language": "spa",
    "languageName": "Spanish",
    "direction": "source",
    "correctness": 1,
    "license": "CC BY 2.0 FR",
    "contributor": "alexmarcelo",
    "hasAudio": true,
    "audioUrls": ["https://tatoeba.org/audio/download/386"],
    "translationCount": 22,
    "translations": [
        {"id": 387, "text": "Hello!", "language": "eng"},
        {"id": 388, "text": "Salut!", "language": "fra"}
    ],
    "url": "https://tatoeba.org/eng/sentences/show/386",
    "scrapedAt": "2026-05-23T00:00:00.000Z"
}
```

</details>

***

### ✨ Why choose this Actor

| | Capability |
|---|---|
| 🌍 | **400+ language coverage.** Major world languages, classical languages, constructed languages, and revived minority tongues. |
| 🎯 | **Combined filters.** Source language, target language, keyword, and tag filters apply together in a single run. |
| 🔊 | **Audio links included.** Native-speaker recordings are flagged and linked when the contributor uploaded one. |
| 📜 | **Clear licensing.** Every record carries its Creative Commons licence string. |
| ⚡ | **Fast.** 10 sentences in under 5 seconds, 10,000 records in under 2 minutes. |
| 🔁 | **Always fresh.** Every run hits the live catalog so new community submissions are picked up. |
| 🚫 | **No authentication.** Public corpus, no API key required. |

> 📊 Parallel sentence corpora are the backbone of every translation memory, flashcard deck, and language-learning curriculum on the market.

***

### 📈 How it compares to alternatives

| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| **⭐ Tatoeba Scraper** *(this Actor)* | $5 free credit, then pay-per-use | **12M+** sentences, 400+ languages | **Live per run** | language, tag, keyword | ⚡ 2 min |
| Commercial translation memories | $500+/month | Domain-specific, limited languages | Quarterly | Industry slice | 🐢 Days |
| Custom site scraper | Free engineering | Manual | Depends on cron | Hand-built | ⏳ Weeks |
| Static corpus dumps | Free | Full but stale | Quarterly tarball | None | 🕒 Hours of parsing |

Pick this Actor when you want fresh community data, built-in filters, and a clean tabular result with zero parser maintenance.

***

### 🚀 How to use

1. 📝 **Sign up.** [Create a free account with $5 credit](https://console.apify.com/sign-up?fpr=vmoqkp) (takes 2 minutes).
2. 🌐 **Open the Actor.** Go to the Tatoeba Sentence Corpus Scraper page on the Apify Store.
3. 🎯 **Set input.** Pick a source language, optional keyword, optional target language and tags, set `maxItems`.
4. 🚀 **Run it.** Click **Start** and let the Actor collect your sentences.
5. 📥 **Download.** Grab your results from the **Dataset** tab as CSV, Excel, JSON, or XML.

> ⏱️ Total time from signup to downloaded corpus: **3-5 minutes.** No coding required.

***

### 💼 Business use cases

<table>
<tr>
<td width="50%" valign="top">

#### 📱 Language-Learning Apps

- Daily-phrase decks for streak-based apps
- CEFR-style example banks per skill level
- Idiom and proverb add-on packs
- Audio prompts for pronunciation drills

</td>
<td width="50%" valign="top">

#### 🤖 NLP & Machine Translation

- Seed parallel corpora for transformer fine-tuning
- Build evaluation sets for translation quality
- Augment domain corpora with everyday phrasing
- Train sentence embedding models

</td>
</tr>
<tr>
<td width="50%" valign="top">

#### 🎓 Linguistic Research

- Comparative syntax studies across language families
- Lexicographic exemplar collection
- Sociolinguistic surveys of register and dialect
- Reproducible corpus pulls with versioned licence info

</td>
<td width="50%" valign="top">

#### 🎙️ Speech & Audio Pipelines

- Voice-acted line banks for text-to-speech eval
- Pronunciation dictionaries with native audio
- Low-resource language audio collection
- Forced-alignment training material

</td>
</tr>
</table>

***

### 🔌 Automating Tatoeba Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

- 🟢 **Node.js.** Install the `apify-client` NPM package.
- 🐍 **Python.** Use the `apify-client` PyPI package.
- 📚 See the [Apify API documentation](https://docs.apify.com/api/v2) for full details.

The [Apify Schedules feature](https://docs.apify.com/platform/schedules) lets you trigger this Actor on any cron interval. Weekly refreshes keep your translation memory and flashcard banks in sync with the latest community additions.

***

### 🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured sentences support research, education, civic projects, and personal initiatives.

<table>
<tr>
<td width="50%">

#### 🎓 Research and academia

- Comparative linguistics theses and papers
- Open-data exercises for NLP coursework
- Sociolinguistic survey corpora
- Reproducible studies citing exact dataset pulls

</td>
<td width="50%">

#### 🎨 Personal and creative

- Polyglot vocabulary journals and Anki decks
- Multilingual quote walls and printables
- Bilingual children's book drafts
- Hobbyist phrasebook apps for travel

</td>
</tr>
<tr>
<td width="50%">

#### 🤝 Non-profit and civic

- Language-revitalization materials for minority tongues
- Refugee-resettlement phrasebooks and trainings
- Free ESL classroom example banks
- Open-source translation projects for NGOs

</td>
<td width="50%">

#### 🧪 Experimentation

- Train sentence-similarity models
- Prototype voice assistants in low-resource languages
- Benchmark embedding models across language pairs
- Test bilingual interface copy with real example data

</td>
</tr>
</table>

***

### 🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

- 💬 [**ChatGPT**](https://chat.openai.com/?q=How%20do%20I%20use%20the%20Tatoeba%20Sentence%20Corpus%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🧠 [**Claude**](https://claude.ai/new?q=How%20do%20I%20use%20the%20Tatoeba%20Sentence%20Corpus%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🔍 [**Perplexity**](https://perplexity.ai/search?q=How%20do%20I%20use%20the%20Tatoeba%20Sentence%20Corpus%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🅒 [**Copilot**](https://copilot.microsoft.com/?q=How%20do%20I%20use%20the%20Tatoeba%20Sentence%20Corpus%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)

***

### ❓ Frequently Asked Questions

#### 🧩 How does it work?

Choose a source language, optional keyword, optional target language, and optional tags. Click Start and the Actor pulls matching sentences with translations, audio links, and licence info attached to each record.

#### 📏 How accurate are the translations?

Tatoeba sentences are peer-reviewed by the community. The `correctness` field reflects that review. Native-speaker contributions are common in major languages, while smaller languages may have a smaller pool of confirmations.

#### 🔁 How often is the corpus refreshed?

The Tatoeba project accepts new sentences and edits continuously. Every Actor run hits the live catalog, so fresh contributions appear in your dataset right away.

#### 🌐 Which languages are supported?

The corpus spans 400+ languages. The input form exposes the 50 most populated languages by ISO 639-3 code (English, Spanish, Mandarin, Japanese, Arabic, German, French, and more). Less common languages can still be reached via translation links.

#### 🔊 Does every sentence have audio?

No. Audio is optional and depends on community uploads. The `hasAudio` flag tells you per record, and `audioUrls` carries the file links when present.

#### ⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to trigger this Actor on any cron interval (daily, weekly, monthly) and keep your downstream corpus in sync.

#### ⚖️ Is this data legal to use?

Yes. Tatoeba sentences are published under a Creative Commons CC BY licence. Attribute the corpus and the individual contributor handles where applicable, and you can use the data commercially or non-commercially.

#### 💼 Can I use this data commercially?

Yes. CC BY allows commercial reuse with attribution. Bundle the licence string and contributor names in your downstream product, and you are good to go.

#### 💳 Do I need a paid Apify plan to use this Actor?

No. The free Apify plan covers testing and small runs (10 records per run). A paid plan lifts the cap and unlocks scheduling, larger datasets, and higher concurrency.

#### 🔁 What happens if a run fails or gets interrupted?

Apify retries transient errors automatically. If a run still fails, inspect the log in the Runs tab, fix the input, and restart. Partial datasets are preserved so you never lose progress.

#### 🆘 What if I need help?

Our support team is here for you. Use the Apify platform messaging or the Tally form linked below.

***

### 🔌 Integrate with any app

Tatoeba Sentence Corpus Scraper connects to any cloud service via [Apify integrations](https://apify.com/integrations):

- [**Make**](https://docs.apify.com/platform/integrations/make) - Automate multi-step workflows
- [**Zapier**](https://docs.apify.com/platform/integrations/zapier) - Connect with 5,000+ apps
- [**Slack**](https://docs.apify.com/platform/integrations/slack) - Get run notifications in your channels
- [**Airbyte**](https://docs.apify.com/platform/integrations/airbyte) - Pipe sentence data into your warehouse
- [**GitHub**](https://docs.apify.com/platform/integrations/github) - Trigger runs from commits and releases
- [**Google Drive**](https://docs.apify.com/platform/integrations/drive) - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh sentences into your translation memory or alert your team in Slack.

***

### 🔗 Recommended Actors

- [**🌐 MyMemory Translation Scraper**](https://apify.com/parseforge/mymemory-translation-scraper) - Translate text across 70+ language pairs
- [**📚 LibriVox Audiobooks Scraper**](https://apify.com/parseforge/librivox-audiobooks-scraper) - Public-domain audiobooks with reader credits
- [**🏛️ Library of Congress Scraper**](https://apify.com/parseforge/loc-gov-library-of-congress-scraper) - 170M+ digitized cultural records
- [**📰 ArXiv Scraper**](https://apify.com/parseforge/arxiv-scraper) - Academic preprints with metadata
- [**📖 Figshare Scraper**](https://apify.com/parseforge/figshare-scraper) - Open research datasets and figures

> 💡 **Pro Tip:** browse the complete [ParseForge collection](https://apify.com/parseforge) for more reference-data scrapers.

***

**🆘 Need Help?** [**Open our contact form**](https://tally.so/r/BzdKgA) to request a new scraper, propose a custom data project, or report an issue.

***

> **⚠️ Disclaimer:** this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the Tatoeba Project or its contributors. All trademarks mentioned are the property of their respective owners. Only publicly available open corpus data is collected, under the project's Creative Commons licence.

# Actor input Schema

## `maxItems` (type: `integer`):

Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000

## `query` (type: `string`):

Text to search for in sentences. Leave empty to browse a language without a filter.

## `fromLanguage` (type: `string`):

Source language (ISO 639-3 code) for the sentences themselves.

## `toLanguage` (type: `string`):

Target translation language (ISO 639-3 code). Leave empty for all available translations.

## `tags` (type: `array`):

Filter sentences by Tatoeba tag names (e.g. proverb, idiom, greeting).

## Actor input object example

```json
{
  "maxItems": 10,
  "query": "hello",
  "fromLanguage": "eng"
}
```

# Actor output Schema

## `overview` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "maxItems": 10,
    "query": "hello",
    "fromLanguage": "eng"
};

// Run the Actor and wait for it to finish
const run = await client.actor("parseforge/tatoeba-sentence-corpus-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "maxItems": 10,
    "query": "hello",
    "fromLanguage": "eng",
}

# Run the Actor and wait for it to finish
run = client.actor("parseforge/tatoeba-sentence-corpus-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "maxItems": 10,
  "query": "hello",
  "fromLanguage": "eng"
}' |
apify call parseforge/tatoeba-sentence-corpus-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=parseforge/tatoeba-sentence-corpus-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Tatoeba Sentence Corpus Scraper",
        "description": "Extract Tatoeba sentence corpus with millions of bilingual example sentences. Capture sentence ID, language, text, owner, audio URL, translations, tags, and license. Export to JSON, CSV, or Excel for language learning, NLP training data, translation memory, and linguistic research.",
        "version": "1.0",
        "x-build-id": "l7K06EBMwbihcmTfT"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/parseforge~tatoeba-sentence-corpus-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-parseforge-tatoeba-sentence-corpus-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/parseforge~tatoeba-sentence-corpus-scraper/runs": {
            "post": {
                "operationId": "runs-sync-parseforge-tatoeba-sentence-corpus-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/parseforge~tatoeba-sentence-corpus-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-parseforge-tatoeba-sentence-corpus-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 1000000,
                        "type": "integer",
                        "description": "Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000"
                    },
                    "query": {
                        "title": "Query",
                        "type": "string",
                        "description": "Text to search for in sentences. Leave empty to browse a language without a filter."
                    },
                    "fromLanguage": {
                        "title": "From Language",
                        "enum": [
                            "eng",
                            "spa",
                            "fra",
                            "deu",
                            "ita",
                            "por",
                            "rus",
                            "cmn",
                            "jpn",
                            "ara",
                            "nld",
                            "tur",
                            "kor",
                            "pol",
                            "ukr",
                            "ron",
                            "ell",
                            "heb",
                            "ces",
                            "hin",
                            "tha",
                            "vie",
                            "ind",
                            "msa",
                            "ben",
                            "fin",
                            "swe",
                            "nor",
                            "dan",
                            "hun",
                            "fil",
                            "yue",
                            "cat",
                            "epo",
                            "lat",
                            "fas",
                            "urd",
                            "tam",
                            "tel",
                            "mar",
                            "isl",
                            "slk",
                            "srp",
                            "hrv",
                            "bul",
                            "slv",
                            "est",
                            "lit",
                            "lav",
                            "afr"
                        ],
                        "type": "string",
                        "description": "Source language (ISO 639-3 code) for the sentences themselves."
                    },
                    "toLanguage": {
                        "title": "To Language",
                        "enum": [
                            "eng",
                            "spa",
                            "fra",
                            "deu",
                            "ita",
                            "por",
                            "rus",
                            "cmn",
                            "jpn",
                            "ara",
                            "nld",
                            "tur",
                            "kor",
                            "pol",
                            "ukr",
                            "ron",
                            "ell",
                            "heb",
                            "ces",
                            "hin",
                            "tha",
                            "vie",
                            "ind",
                            "msa",
                            "ben",
                            "fin",
                            "swe",
                            "nor",
                            "dan",
                            "hun",
                            "fil",
                            "yue",
                            "cat",
                            "epo",
                            "lat",
                            "fas",
                            "urd",
                            "tam",
                            "tel",
                            "mar",
                            "isl",
                            "slk",
                            "srp",
                            "hrv",
                            "bul",
                            "slv",
                            "est",
                            "lit",
                            "lav"
                        ],
                        "type": "string",
                        "description": "Target translation language (ISO 639-3 code). Leave empty for all available translations."
                    },
                    "tags": {
                        "title": "Tags",
                        "type": "array",
                        "description": "Filter sentences by Tatoeba tag names (e.g. proverb, idiom, greeting).",
                        "items": {
                            "type": "string"
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
