VoiceClonerTTS
Pricing
from $0.005 / actor start
Pricing
from $0.005 / actor start
Rating
0.0
(0)
Developer

Lucy Paureau
Actor stats
1
Bookmarked
1
Total users
0
Monthly active users
9 days ago
Last modified
Categories
Share
Ultra Quality TTS – Voice Cloning API
High-quality text-to-speech API with voice cloning. Send your text and a short reference audio URL (or upload), get natural speech in the cloned voice. No code required—run from Apify Console or call the Apify API. Output is stored in the run's key-value store and returned in the dataset.
Features
- Voice cloning – Clone any voice from a ~3 second reference audio sample (URL or file upload).
- Ultra quality TTS – Natural, realistic speech output.
- Simple API – Input: text (max 2000 characters) + reference audio URL. Output: audio file (WAV) in key-value store (e.g.
audio-<uuid>.wav). - No code – Use the Apify Console form, or trigger via API, webhooks, Make, Zapier, or any HTTP client.
- Flexible input – Support for S3 URLs, public URLs, or Apify file upload for reference audio.
Quick start
- Input – Provide
text(required) andreference_audio(required: URL or upload). - Run – Start the Actor from Apify Console or via the Apify API.
- Output – In the run's dataset you get
successandoutputKey(e.g.audio-<uuid>.wav). Download the audio from the run's key-value store using that key.
Input / Parameters
| Parameter | Required | Description |
|---|---|---|
text | Yes | Text to synthesize into speech. Maximum 2000 characters. |
reference_audio | Yes | URL of the reference audio (e.g. S3) or upload a file. About 3 seconds of clean speech recommended. |
Output
Each run writes one item to the default dataset and one file to the default key-value store:
- Dataset –
{ "success": true, "outputKey": "audio-<uuid>.wav" }(orsuccess: falseanderroron failure). - Key-value store – WAV audio file under the key
outputKey(e.g.audio-f47ac10b-58cc-4372-a567-0e02b2c3d479.wav). Use the Apify Key-Value Store API or the run's Storage tab to download the file.
Example: Run via API
Option 1 – Run and get dataset items in one call (waits for completion, returns outputKey in the response):
curl -X POST "https://api.apify.com/v2/acts/lucymakeit~VoiceClonerTTS/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"text": "Welcome to our product. This is an example of voice cloning.","reference_audio": "https://your-bucket.s3.region.amazonaws.com/path/to/reference.mp3"}'
Response is an array of dataset items, e.g. [{ "success": true, "outputKey": "audio-<uuid>.wav" }]. Use outputKey to download the audio from the run's key-value store.
Option 2 – Start run, then fetch result (async: start run, poll or wait, then get dataset/key-value store):
# Start the runcurl -X POST "https://api.apify.com/v2/acts/lucymakeit~VoiceClonerTTS/runs?token=YOUR_APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"text": "Welcome to our product. This is an example of voice cloning.","reference_audio": "https://your-bucket.s3.region.amazonaws.com/path/to/reference.mp3"}'
Response includes data.id (run ID). Then get the output:
# Get dataset items (contains outputKey)curl "https://api.apify.com/v2/actor-runs/RUN_ID/dataset/items?token=YOUR_APIFY_TOKEN"# Download the audio (use defaultKeyValueStoreId from run details and outputKey from dataset)curl "https://api.apify.com/v2/key-value-stores/STORE_ID/records/OUTPUT_KEY?token=YOUR_APIFY_TOKEN" -o output.wav
Example: Input/Output
Input (JSON):
{"text": "Le contenu de ce site est le fruit du travail de journalistes qui s'engagent chaque jour pour vous apporter une information locale de qualité.","reference_audio": "https://your-bucket.s3.eu-west-1.amazonaws.com/samples/voice-sample.mp3"}
Output (dataset item):
{"success": true,"outputKey": "audio-a1b2c3d4-e5f6-7890-abcd-ef1234567890.wav"}
The audio file is in the run's key-value store under outputKey.
Use cases
- Audio content – Generate voiceovers for podcasts, videos, or social media in a consistent cloned voice.
- Dubbing – Produce dubbed speech from text while keeping a target voice character.
- Accessibility – Turn articles or scripts into natural speech with a chosen or cloned voice.
- IVR & voicebots – Create custom TTS for hotlines or conversational AI without generic synthetic voices.
- App personalization – Let users clone their voice for personalized assistants or messages.
FAQ
What audio format is supported for the reference?
URLs to common formats (e.g. MP3, WAV) work. About 3 seconds of clear speech without music or noise gives the best results.
Is there a maximum text length?
Yes. The text must not exceed 2000 characters. For longer content, split it into segments and run the Actor multiple times.
What is the output audio format?
The file is stored as WAV under a key like audio-<uuid>.wav. Use the key-value store URL to download it.
Can I use S3 URLs for reference_audio?
Yes. Use a public URL or a pre-signed S3 URL that is accessible from the Actor (no auth required beyond the URL itself).
The run failed with "reference_audio is required".
Ensure your input includes both text and reference_audio (non-empty URL or an uploaded file via the file upload field).