Hugging Face Audio AI
Pricing
from $0.01 / 1,000 results
Hugging Face Audio AI
Audio w/Hugging Face models speech recognition, text-to-speech & audio analysis Speech-to-Text: Transcribe audio Text-to-Speech: Generate natural speech Audio Classification: Classify sounds Voice Activity Detection: Detect speech Speaker Diarization: Identify speakers Music Generation: Create music
Pricing
from $0.01 / 1,000 results
Rating
0.0
(0)
Developer

John Rippy
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Built by John Rippy | johnrippy.link
🏆 2025 Zapier Automation Hero of the Year — Project Phoenix: A 95-step AI sales pipeline cutting development time by 50%. Read more →
Audio AI Processing - Speech Recognition, TTS & Music Generation
Audio processing with Hugging Face models - speech recognition, text-to-speech, and audio analysis.
Features
- Speech-to-Text: Transcribe audio with Whisper models
- Text-to-Speech: Generate natural speech from text
- Audio Classification: Classify sounds, music, and speech
- Voice Activity Detection: Detect speech segments in audio
- Speaker Diarization: Identify different speakers
- Music Generation: Create music from text prompts
Supported Tasks
| Task | Description | Default Model |
|---|---|---|
speech_to_text | Transcribe audio to text | openai/whisper-large-v3 |
text_to_speech | Generate speech from text | microsoft/speecht5_tts |
audio_classification | Classify audio content | MIT/ast-finetuned-audioset-10-10-0.4593 |
audio_to_audio | Audio enhancement/conversion | speechbrain/sepformer-wham |
voice_activity_detection | Detect speech in audio | pyannote/voice-activity-detection |
speaker_diarization | Identify speakers | pyannote/speaker-diarization |
audio_embeddings | Generate audio embeddings | facebook/wav2vec2-base-960h |
music_generation | Generate music from prompts | facebook/musicgen-small |
Popular Models
| Category | Models |
|---|---|
| Transcription | openai/whisper-large-v3, openai/whisper-medium, facebook/wav2vec2-large-960h-lv60-self |
| TTS | microsoft/speecht5_tts, facebook/mms-tts-eng, suno/bark |
| Classification | MIT/ast-finetuned-audioset-10-10-0.4593, superb/hubert-large-superb-er |
| Music | facebook/musicgen-small, facebook/musicgen-medium |
Pricing
$0.01 per API call
Part of the Hugging Face AI Suite - Text, Image, Audio, and Hub actors for comprehensive AI capabilities.