Under maintenance

Pricing

$0.005 / actor start

Try for free

Go to Apify Store

VoiceClonerTTS

Under maintenance

Try for free

High-quality text-to-speech API with voice cloning.

Pricing

$0.005 / actor start

Rating

0.0

(0)

Developer

Lucy Paureau

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Ultra Quality TTS – Voice Cloning API

High-quality text-to-speech API with voice cloning. Send your text and a short reference audio URL (or upload), get natural speech in the cloned voice. No code required—run from Apify Console or call the Apify API. Output is stored in the run's key-value store and returned in the dataset.

Features

Voice cloning – Clone any voice from a ~3 second reference audio sample (URL or file upload).
Ultra quality TTS – Natural, realistic speech output.
Simple API – Input: text (max 2000 characters) + reference audio URL. Output: audio file (WAV) in key-value store (e.g. audio-<uuid>.wav).
No code – Use the Apify Console form, or trigger via API, webhooks, Make, Zapier, or any HTTP client.
Flexible input – Support for S3 URLs, public URLs, or Apify file upload for reference audio.

Quick start

Input – Provide text (required) and reference_audio (required: URL or upload).
Run – Start the Actor from Apify Console or via the Apify API.
Output – In the run's dataset you get success and outputKey (e.g. audio-<uuid>.wav). Download the audio from the run's key-value store using that key.

Input / Parameters

Parameter	Required	Description
`text`	Yes	Text to synthesize into speech. Maximum 2000 characters.
`reference_audio`	Yes	URL of the reference audio (e.g. S3) or upload a file. About 3 seconds of clean speech recommended.

Output

Each run writes one item to the default dataset and one file to the default key-value store:

Dataset – { "success": true, "outputKey": "audio-<uuid>.wav" } (or success: false and error on failure).
Key-value store – WAV audio file under the key outputKey (e.g. audio-f47ac10b-58cc-4372-a567-0e02b2c3d479.wav). Use the Apify Key-Value Store API or the run's Storage tab to download the file.

Example: Run via API

Option 1 – Run and get dataset items in one call (waits for completion, returns outputKey in the response):

curl -X POST "https://api.apify.com/v2/acts/lucymakeit~VoiceClonerTTS/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to our product. This is an example of voice cloning.",
    "reference_audio": "https://your-bucket.s3.region.amazonaws.com/path/to/reference.mp3"
  }'

Response is an array of dataset items, e.g. [{ "success": true, "outputKey": "audio-<uuid>.wav" }]. Use outputKey to download the audio from the run's key-value store.

Option 2 – Start run, then fetch result (async: start run, poll or wait, then get dataset/key-value store):

# Start the run
curl -X POST "https://api.apify.com/v2/acts/lucymakeit~VoiceClonerTTS/runs?token=YOUR_APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to our product. This is an example of voice cloning.",
    "reference_audio": "https://your-bucket.s3.region.amazonaws.com/path/to/reference.mp3"
  }'

Response includes data.id (run ID). Then get the output:

# Get dataset items (contains outputKey)
curl "https://api.apify.com/v2/actor-runs/RUN_ID/dataset/items?token=YOUR_APIFY_TOKEN"

# Download the audio (use defaultKeyValueStoreId from run details and outputKey from dataset)
curl "https://api.apify.com/v2/key-value-stores/STORE_ID/records/OUTPUT_KEY?token=YOUR_APIFY_TOKEN" -o output.wav

Example: Input/Output

Input (JSON):

{
  "text": "Le contenu de ce site est le fruit du travail de journalistes qui s'engagent chaque jour pour vous apporter une information locale de qualité.",
  "reference_audio": "https://your-bucket.s3.eu-west-1.amazonaws.com/samples/voice-sample.mp3"
}

Output (dataset item):

{
  "success": true,
  "outputKey": "audio-a1b2c3d4-e5f6-7890-abcd-ef1234567890.wav"
}

The audio file is in the run's key-value store under outputKey.

Use cases

Audio content – Generate voiceovers for podcasts, videos, or social media in a consistent cloned voice.
Dubbing – Produce dubbed speech from text while keeping a target voice character.
Accessibility – Turn articles or scripts into natural speech with a chosen or cloned voice.
IVR & voicebots – Create custom TTS for hotlines or conversational AI without generic synthetic voices.
App personalization – Let users clone their voice for personalized assistants or messages.

FAQ

What audio format is supported for the reference?
URLs to common formats (e.g. MP3, WAV) work. About 3 seconds of clear speech without music or noise gives the best results.

Is there a maximum text length?
Yes. The text must not exceed 2000 characters. For longer content, split it into segments and run the Actor multiple times.

What is the output audio format?
The file is stored as WAV under a key like audio-<uuid>.wav. Use the key-value store URL to download it.

Can I use S3 URLs for reference_audio?
Yes. Use a public URL or a pre-signed S3 URL that is accessible from the Actor (no auth required beyond the URL itself).

The run failed with "reference_audio is required".
Ensure your input includes both text and reference_audio (non-empty URL or an uploaded file via the file upload field).

Google Free Text to Speech

jupri/google-speech

Use free Google Text to Speech to translate text into voice

cat

216

Text To Speech

calm_necessity/text-to-speech

AI Text-to-Speech API that converts written text into high-quality natural voice audio. Supports multiple voices, languages, adjustable speed and pitch, ideal for audiobooks, podcasts, accessibility, automation, and voice-enabled applications.

Taher Ali Badnawarwala

Speech To Text

vivid_astronaut/speech-to-text

Convert speech to text with high accuracy using Azure AI. Supports 100+ languages, speaker detection, and timestamps. Perfect for transcription, subtitles, and voice-to-text applications.

Fabio Suizu

Text-to-Speech Generator (OpenAI voice generator)

stanvanrooy6/text-to-speech-generator-openai-voice-generator

Convert text to speech effortlessly with our OpenAI voice generator. Choose from 6 English-optimized voices, customize settings, and get high-quality audio files fast. Simple to use, integrates with your OpenAI API key.

Stan Van Rooy

1.0

Hugging Face Audio AI

alizarin_refrigerator-owner/hugging-face-audio-ai

Audio w/Hugging Face models speech recognition, text-to-speech & audio analysis Speech-to-Text: Transcribe audio Text-to-Speech: Generate natural speech Audio Classification: Classify sounds Voice Activity Detection: Detect speech Speaker Diarization: Identify speakers Music Generation: Create music

The Howlers

Text To Speech

vivid_astronaut/text-to-speech

Convert text to natural speech using AI voices. Multiple voices and languages available. Generate audio files for podcasts, videos, accessibility, and voice assistants.

Fabio Suizu

Text to Speech Generator

moving_beacon-owner1/my-actor-30

Convert text into natural-sounding speech in multiple languages with ease.

Jamshaid Arif

Text to speech generator

akash9078/advanced-text-to-speech

Professional-grade Text-to-Speech (TTS) actor powered by advanced AI models. Convert any text into natural, human-like speech with 50+ premium voices across 9 languages. Perfect for content creation, accessibility, voiceovers, audiobooks, podcasts, and multilingual applications.

Akash Kumar Naik

Speech AI MCP Server

vivid_astronaut/pronunciation-assessment-mcp

Speech AI MCP server with 9 tools: pronunciation scoring (0-100 at phoneme/word/sentence level), speech-to-text with timestamps, text-to-speech with 12 English voices, and multilingual Whisper transcription (99 languages + speaker diarization). Sub-300ms latency. Pay-per-use: $0.02/call.

Fabio Suizu