Hugging Face Audio AI avatar
Hugging Face Audio AI

Pricing

from $0.01 / 1,000 results

Go to Apify Store
Hugging Face Audio AI

Hugging Face Audio AI

Audio w/Hugging Face models speech recognition, text-to-speech & audio analysis Speech-to-Text: Transcribe audio Text-to-Speech: Generate natural speech Audio Classification: Classify sounds Voice Activity Detection: Detect speech Speaker Diarization: Identify speakers Music Generation: Create music

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

John Rippy

John Rippy

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Built by John Rippy | johnrippy.link

🏆 2025 Zapier Automation Hero of the YearProject Phoenix: A 95-step AI sales pipeline cutting development time by 50%. Read more →

Audio AI Processing - Speech Recognition, TTS & Music Generation

Audio processing with Hugging Face models - speech recognition, text-to-speech, and audio analysis.

Features

  • Speech-to-Text: Transcribe audio with Whisper models
  • Text-to-Speech: Generate natural speech from text
  • Audio Classification: Classify sounds, music, and speech
  • Voice Activity Detection: Detect speech segments in audio
  • Speaker Diarization: Identify different speakers
  • Music Generation: Create music from text prompts

Supported Tasks

TaskDescriptionDefault Model
speech_to_textTranscribe audio to textopenai/whisper-large-v3
text_to_speechGenerate speech from textmicrosoft/speecht5_tts
audio_classificationClassify audio contentMIT/ast-finetuned-audioset-10-10-0.4593
audio_to_audioAudio enhancement/conversionspeechbrain/sepformer-wham
voice_activity_detectionDetect speech in audiopyannote/voice-activity-detection
speaker_diarizationIdentify speakerspyannote/speaker-diarization
audio_embeddingsGenerate audio embeddingsfacebook/wav2vec2-base-960h
music_generationGenerate music from promptsfacebook/musicgen-small
CategoryModels
Transcriptionopenai/whisper-large-v3, openai/whisper-medium, facebook/wav2vec2-large-960h-lv60-self
TTSmicrosoft/speecht5_tts, facebook/mms-tts-eng, suno/bark
ClassificationMIT/ast-finetuned-audioset-10-10-0.4593, superb/hubert-large-superb-er
Musicfacebook/musicgen-small, facebook/musicgen-medium

Pricing

$0.01 per API call


Part of the Hugging Face AI Suite - Text, Image, Audio, and Hub actors for comprehensive AI capabilities.