VoiceClonerTTS avatar

VoiceClonerTTS

Under maintenance

Pricing

from $0.005 / actor start

Go to Apify Store
VoiceClonerTTS

VoiceClonerTTS

Under maintenance

High-quality text-to-speech API with voice cloning.

Pricing

from $0.005 / actor start

Rating

0.0

(0)

Developer

Lucy Paureau

Lucy Paureau

Maintained by Community

Actor stats

1

Bookmarked

1

Total users

0

Monthly active users

9 days ago

Last modified

Share

Ultra Quality TTS – Voice Cloning API

High-quality text-to-speech API with voice cloning. Send your text and a short reference audio URL (or upload), get natural speech in the cloned voice. No code required—run from Apify Console or call the Apify API. Output is stored in the run's key-value store and returned in the dataset.

Features

  • Voice cloning – Clone any voice from a ~3 second reference audio sample (URL or file upload).
  • Ultra quality TTS – Natural, realistic speech output.
  • Simple API – Input: text (max 2000 characters) + reference audio URL. Output: audio file (WAV) in key-value store (e.g. audio-<uuid>.wav).
  • No code – Use the Apify Console form, or trigger via API, webhooks, Make, Zapier, or any HTTP client.
  • Flexible input – Support for S3 URLs, public URLs, or Apify file upload for reference audio.

Quick start

  1. Input – Provide text (required) and reference_audio (required: URL or upload).
  2. Run – Start the Actor from Apify Console or via the Apify API.
  3. Output – In the run's dataset you get success and outputKey (e.g. audio-<uuid>.wav). Download the audio from the run's key-value store using that key.

Input / Parameters

ParameterRequiredDescription
textYesText to synthesize into speech. Maximum 2000 characters.
reference_audioYesURL of the reference audio (e.g. S3) or upload a file. About 3 seconds of clean speech recommended.

Output

Each run writes one item to the default dataset and one file to the default key-value store:

  • Dataset{ "success": true, "outputKey": "audio-<uuid>.wav" } (or success: false and error on failure).
  • Key-value store – WAV audio file under the key outputKey (e.g. audio-f47ac10b-58cc-4372-a567-0e02b2c3d479.wav). Use the Apify Key-Value Store API or the run's Storage tab to download the file.

Example: Run via API

Option 1 – Run and get dataset items in one call (waits for completion, returns outputKey in the response):

curl -X POST "https://api.apify.com/v2/acts/lucymakeit~VoiceClonerTTS/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"text": "Welcome to our product. This is an example of voice cloning.",
"reference_audio": "https://your-bucket.s3.region.amazonaws.com/path/to/reference.mp3"
}'

Response is an array of dataset items, e.g. [{ "success": true, "outputKey": "audio-<uuid>.wav" }]. Use outputKey to download the audio from the run's key-value store.

Option 2 – Start run, then fetch result (async: start run, poll or wait, then get dataset/key-value store):

# Start the run
curl -X POST "https://api.apify.com/v2/acts/lucymakeit~VoiceClonerTTS/runs?token=YOUR_APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"text": "Welcome to our product. This is an example of voice cloning.",
"reference_audio": "https://your-bucket.s3.region.amazonaws.com/path/to/reference.mp3"
}'

Response includes data.id (run ID). Then get the output:

# Get dataset items (contains outputKey)
curl "https://api.apify.com/v2/actor-runs/RUN_ID/dataset/items?token=YOUR_APIFY_TOKEN"
# Download the audio (use defaultKeyValueStoreId from run details and outputKey from dataset)
curl "https://api.apify.com/v2/key-value-stores/STORE_ID/records/OUTPUT_KEY?token=YOUR_APIFY_TOKEN" -o output.wav

Example: Input/Output

Input (JSON):

{
"text": "Le contenu de ce site est le fruit du travail de journalistes qui s'engagent chaque jour pour vous apporter une information locale de qualité.",
"reference_audio": "https://your-bucket.s3.eu-west-1.amazonaws.com/samples/voice-sample.mp3"
}

Output (dataset item):

{
"success": true,
"outputKey": "audio-a1b2c3d4-e5f6-7890-abcd-ef1234567890.wav"
}

The audio file is in the run's key-value store under outputKey.

Use cases

  • Audio content – Generate voiceovers for podcasts, videos, or social media in a consistent cloned voice.
  • Dubbing – Produce dubbed speech from text while keeping a target voice character.
  • Accessibility – Turn articles or scripts into natural speech with a chosen or cloned voice.
  • IVR & voicebots – Create custom TTS for hotlines or conversational AI without generic synthetic voices.
  • App personalization – Let users clone their voice for personalized assistants or messages.

FAQ

What audio format is supported for the reference?
URLs to common formats (e.g. MP3, WAV) work. About 3 seconds of clear speech without music or noise gives the best results.

Is there a maximum text length?
Yes. The text must not exceed 2000 characters. For longer content, split it into segments and run the Actor multiple times.

What is the output audio format?
The file is stored as WAV under a key like audio-<uuid>.wav. Use the key-value store URL to download it.

Can I use S3 URLs for reference_audio?
Yes. Use a public URL or a pre-signed S3 URL that is accessible from the Actor (no auth required beyond the URL itself).

The run failed with "reference_audio is required".
Ensure your input includes both text and reference_audio (non-empty URL or an uploaded file via the file upload field).