Pricing

from $0.90 / successful api call

Google Live Translate

Apify Actor & MCP Server for real-time translation, transcription, and language detection using Google Gemini 3.5 Live Translate with emotional voice preservation.

Pricing

from $0.90 / successful api call

Rating

0.0

(0)

Developer

Sergio Calvo

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Google Live Translate - Apify Actor & MCP Server

Google Live Translate is an industrial-grade dual system written in Node.js and TypeScript. It natively integrates Google's official multimodal translation API to process text, audio files, and Google Meet recordings in real time while preserving the emotional intonation and voice characteristics of the original speaker.

The project offers two production-ready distribution interfaces:

Apify Actor (/actor): Engineered for high-throughput batch processing, audio streaming, and large-scale subtitle generation.
MCP Server (/mcp-server): A server compatible with Anthropic's Model Context Protocol (MCP) that exposes translation and language detection tools directly to AI agents such as Claude Desktop.

🎯 Target Audience & 💡 Primary Use Cases

Commercial Value & Use Cases (Primary Use Cases)

Automated Content Localization: Automatically generate multi-language subtitles (SRT, VTT, JSON) for corporate videos, webinars, and tutorials in minutes.
International Meeting Auditing: Transcribe and translate sales or support calls in real-time, capturing emotional nuances and voice tone.
Machine Learning & Datasets: Process large volumes of audio files to compile clean datasets for AI training or sector-specific customer service analysis.

Target Audience

Data Analysts & Scientists: Retrieve clean JSON datasets containing exact timestamps (startMs, endMs), original transcriptions, translations, and accuracy scores (confidence).
Operations & Business Teams: Automate the translation of Google Meet recordings stored in Drive without writing code, improving global team collaboration.
AI Developers & Engineers: Seamlessly integrate audio translation with emotional voice cloning into local agent workflows via the MCP server.

⚙️ Key Features (What the Actor does)

Direct & Optimized API Calls: Connects natively to Google's official gemini-3.5-live-translate API via raw WebSockets.
Emotional Voice Style Preservation: Automatically detects tone, rhythm, and expressiveness when the preserveVoiceStyle: true setting is enabled.
Automatic Language Detection: Autodetects the source language (supporting 70+ languages under the BCP-47 standard) with a corresponding confidence metric.
Smart Audio Chunking: Processes large files by splitting audio into optimized segments (e.g., 8 seconds) using static FFmpeg to prevent context limit errors and guarantee precise timestamps.
Translated Audio Output: Captures the translated PCM audio streamed back from Gemini, concatenates all segments, and saves the final translated voice output as play-ready WAV and MP3 files.
Ultra-Fast Inactivity Latch: Implements a smart text-activity detector that closes the Bidi stream 4 seconds after transcription stops, avoiding the default 90-second socket timeout and reducing processing time by 90%.
Native Error Management: Instead of crashing on invalid inputs or network errors, it records a structured error payload in the dataset.
Flexible Export Formats: Outputs clean results in JSON, SRT, VTT, and plaintext formats.
Rate Limiting & Exponential Backoff: Built-in throttling at a maximum of 10 requests per second with automatic exponential retries for network drops or rate limits (HTTP 429).

Why Google Live Translate? (Competitive Advantage)

Unlike traditional translators that only process text and strip away the speaker's vocal characteristics, Google Live Translate merges acoustic transcription with a multimodal AI model. This setup delivers:

An 85% reduction in latency compared to traditional cascaded pipelines (transcribe -> translate -> synthesize).
True emotional voice style preservation (capturing humor, severity, or urgency) to improve empathy in automated customer service channels.
Unique Technical Versatility: Runs on serverless cloud infrastructure (Apify) for large batch processing, or locally on a developer's machine (MCP) as an LLM utility extension.

⚙️ Input Schema

The Actor accepts the following parameters in its input form:

Field	Type	Required	Default	Description
`mode`	`string`	Yes	`text`	Supported modes: `text`, `audio_file`, `audio_base64`, `audio_url`, `meet_recording`
`targetLang`	`string`	Yes	-	Target language code selected from a dropdown (e.g., `es`, `fr`, `zh`, `en`)
`inputText`	`string`	No	-	Plain text to translate (required if `mode` is `text`)
`audioFile`	`string`	No	-	Upload local audio file directly from your computer (required if `mode` is `audio_file`)
`audioBase64`	`string`	No	-	Base64-encoded audio track (required if `mode` is `audio_base64`)
`audioUrl`	`string`	No	-	Public URL of the audio/video file, or Google Drive URL (for Meet recordings)
`sourceLang`	`string`	No	`auto`	BCP-47 source language code or `auto` for auto-detection (dropdown select)
`preserveVoiceStyle`	`boolean`	No	`true`	Preserve the speaker's original emotional tone and voice style
`outputFormat`	`string`	No	`json`	Format of the output: `json`, `srt`, `vtt`, `plaintext`
`googleCloudApiKey`	`string`	No	-	Google Cloud API Key. If omitted, the Actor attempts to use ADC or Service Account JSON

📊 Output Schema

Audio translations output a detailed JSON structure saved to the Apify Dataset:

{
  "translationId": "aud-xyz123456",
  "sourceLang": "en",
  "targetLang": "es",
  "detectedLang": "en",
  "inputType": "audio_url",
  "segments": [
    {
      "index": 0,
      "startMs": 0,
      "endMs": 8000,
      "originalText": "Good morning and welcome to our annual review meeting.",
      "translatedText": "Buenos días y bienvenidos a nuestra reunión de revisión anual.",
      "confidence": 0.98,
      "voiceStylePreserved": true
    }
  ],
  "metadata": {
    "durationMs": 8000,
    "wordCount": 11,
    "processingMs": 1120,
    "modelVersion": "gemini-3.5-live-translate"
  },
  "srtContent": "1\n00:00:00,000 --> 00:00:08,000\nBuenos días y bienvenidos a nuestra reunión de revisión anual.\n",
  "vttContent": "WEBVTT\n\n1\n00:00:00.000 --> 00:00:08.000\nBuenos días y bienvenidos a nuestra reunión de revisión anual.\n",
  "plaintextContent": "Buenos días y bienvenidos a nuestra reunión de revisión anual."
}

🛠️ Detailed Architecture & How It Works

This Actor bridges local media files and the Gemini Multimodal Live API Bidi (bidirectional) WebSocket stream.

1. Audio Splitting and Preprocessing

When an audio/video file (or URL/uploaded file) is processed, the Actor uses a static binary of FFmpeg to:

Split the file into small, digestible chunks (default: 8 seconds each) to guarantee context window availability and provide precise time stamps.
Downsample and encode each audio chunk to 16kHz, mono, 16-bit signed little-endian PCM (s16le) format, which is the native input format expected by Gemini.

2. WebSocket Connection & Latch Handshake

For each chunk, the Actor opens a secure WebSocket connection to wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent.

Latch Mechanism: It sends the initial setup frame configure block and waits for the server's setupComplete confirmation. Audio data is only streamed after the latch completes, preventing the common 1007 protocol errors.
Audio Streaming: Audio chunks are read in memory and sent as base64-encoded frames to the socket.

3. Smart Transcription Inactivity Check

Unlike traditional text models, Gemini Live Translate keeps streaming real-time audio (including padding/silence) to keep the line active, which normally causes clients to hang until hitting the 90-second socket timeout.

Text-Activity Monitor: Our translator monitors incoming frames and keeps track of the last time actual transcription text was received.
Fast Exit: If 4 seconds pass without new transcription text after all audio chunks are sent, the socket is cleanly closed. This reduces processing time from 6 minutes to ~30 seconds for a 30-second audio track.

4. Audio Recovery & MP3 Packaging

The Actor captures the base64-encoded translated PCM audio (audio/pcm;rate=24000) returned by Gemini.

It concatenates the raw buffers.
Prepends a standard 44-byte WAV header with the exact sample rate (24000 Hz) and PCM properties.
Encodes the WAV file into a highly-compressed MP3 file (translated_output.mp3) using FFmpeg.

🚀 Installation & Quick Start Guide

Deploying the Actor to Apify

Login to Apify CLI: If you have the Apify CLI installed globally, run:
```
$apify login
```
Push the Actor: Run the push command from the /actor directory:
```
$npx apify-cli push
```
Note: The included .actorignore file automatically excludes local audio test files and compilations (dist/, .system_generated/, *.wav, *.mp3) to keep deployment packages small and fast.
Configure Settings: In the Apify Console, set your GOOGLE_CLOUD_API_KEY under the Environment Variables section.

Local Development and Testing

To test the audio translation and MP3 voice generation locally:

Build the TypeScript files:
```
$npm run build
```

Run the test script with your API key:

$$env:GOOGLE_CLOUD_API_KEY="YOUR_API_KEY"; node dist/test-local-audio.js

This will translate the sample mp3 file, print subtitles to console, and save translated_output.wav and translated_output.mp3 in the workspace.

Integrating the MCP Server (Claude Desktop)

Build the MCP server:

cd mcp-server
npm install
npm run build

Add the config to your Claude Desktop mcp_config.json:

{
  "mcpServers": {
    "google-live-translate": {
      "command": "node",
      "args": ["/absolute/path/to/mcp-server/dist/index.js"],
      "env": {
        "GOOGLE_CLOUD_API_KEY": "YOUR_GEMINI_API_KEY"
      }
    }
  }
}

💼 Business Use Cases & Monetization

Segment	Workflow	Cost & Value Model
Multilingual Support	Call centers requiring real-time translation between agents and clients.	$0.90 per successful API call
Video Subtitling	Content creators and e-learning platforms publishing across global markets.	$0.90 per successful API call
International Meetings	Pipelines that translate Google Meet recordings and deliver SRT subtitles.	$0.90 per successful API call
NLP Research & Datasets	Translation datasets with confidence scores, metadata, and voice style details.	$0.90 per successful API call

🔌 Automation & Integraciones (Automating)

No-Code Platforms: Trigger the Actor via Webhooks from Make, Zapier, n8n, or ActiveCampaign as soon as a new recording is uploaded.
Schedules: Set up Apify's internal Cron Schedules to automatically look for and translate new recordings in Google Drive at regular intervals (daily, weekly, etc.).
Cloud Databases: Export structured datasets directly to BigQuery, Snowflake, Amazon S3, Postgres, or vector databases for downstream RAG analytics pipelines.

🌟 Frequently Asked Questions (FAQ)

Does the system require a local FFmpeg installation?

No. The project includes @ffmpeg-installer/ffmpeg as a dependency, which installs a platform-specific static binary for FFmpeg (Windows, macOS, or Linux) out-of-the-box. This ensures audio splitting works automatically in local and Docker containers.

How are private Google Meet recordings fetched from Google Drive?

If you configure the MCP Server or the Actor using a Google Service Account JSON or Application Default Credentials (ADC) with Drive read access, the system automatically requests a secure OAuth token and sends it in the download request header (Authorization: Bearer <TOKEN>).

Can it translate video files as well as audio?

Yes. The internal FFmpeg compiler automatically demuxes the audio track from video files (such as .mp4, .mkv, or .webm) and transcodes it into a 16kHz mono PCM WAV stream for Gemini.

How does structured schema data improve AI engine discoverability?

Based on Generative Engine Optimization (GEO) research by Princeton University, serving rich, schema-structured JSON outputs and structured page markup increases the visibility and citation rates of resources by AI search engines (like ChatGPT, Perplexity, and Gemini) by up to 40%, ensuring accuracy and proper attribution of origin data.

[!NOTE]
This service communicates directly with official Google Cloud APIs, ensuring full data privacy compliance without using web scraping techniques.

Google Translate — Free Unlimited Text Translation

maged120/google-translate-scraper

Translate text between any of 100+ languages using Google Translate. Batch translate multiple texts in one run without an API key, Google Cloud account, or billing.

Maged

230

Google Translate Scraper

thescrappa/google-translate-scraper

Translate text in bulk through Scrappa's Google Translate API. Batch multiple text items in one Apify run and export one dataset item per translation.

Scrappa

Google Translate Scraper Pro

hello.datawizards/google-translate-scraper-pro

Google Translate Scraper Pro lets you bulk-translate text using Google Translate. Simply provide source and target languages along with text input, and get clean JSON output with original and translated text. Ideal for localization, automation, or NLP workflows.

datawizards

Google Cloud Translation

seemuapps/google-cloud-translation

Translate a list of texts into any target language using the Google Cloud Translation API. Bulk-translate scraped content, product descriptions, reviews, or support tickets at scale.

Andrew

Google Translator

web.harvester/google-translator

Translate any text to any of the supported languages using https://translate.google.com/

Web Harvester

373

Yet Another Dataset Translator

mvolfik/yet-another-dataset-translator

Actor to translate datasets with field selection and source language detection. Requires Google Translate API Key.

Matěj Volf

Lara Translate MCP Server

agentify/lara-translate-mcp-server

The Lara Translate MCP Server handles translation requests and manages multilingual processing for Lara Translate services.

agentify

Google Translation Scraper

dev_bodex/google-translation-scraper

This Google Translation Scraper Actor automates extracting translations for any input text from Google Translate. Built with Node.js and Puppeteer, it efficiently retrieves translations in multiple languages, providing structured data for use in language apps, research, or educational projects.