Brainiall Speech MCP Server
Pricing
from $20.00 / 1,000 tool calls
Brainiall Speech MCP Server
Production speech AI tools for AI agents: Brainiall Pronunciation (0-100 phoneme/word/sentence scoring), Brainiall Speech (transcription with timestamps), Brainiall Speech Pro (99 languages + diarization), Brainiall Voice (12 voices). Sub-second p50.
Pricing
from $20.00 / 1,000 tool calls
Rating
0.0
(0)
Developer
Fabio Suizu
Maintained by CommunityActor stats
0
Bookmarked
1
Total users
0
Monthly active users
20 days ago
Last modified
Categories
Share
Production speech AI tools for MCP-enabled AI agents: pronunciation scoring, speech-to-text, text-to-speech, and multilingual transcription — powered by the Brainiall Speech engines.
Tools
| Tool | Description |
|---|---|
| assess_pronunciation | Brainiall Pronunciation: score English pronunciation from audio (0-100 at overall, sentence, word, phoneme levels) |
| transcribe_audio | Brainiall Speech: convert spoken English to text with word-level timestamps |
| synthesize_speech | Brainiall Voice: generate natural speech from text (12 voices, American & British) |
| transcribe_audio_pro | Brainiall Speech Pro: 99 languages, speaker diarization |
| list_tts_voices | List available Brainiall Voice voices |
| check_pronunciation_service | Health check for the pronunciation engine |
| check_stt_service | Health check for the transcription engine |
| check_tts_service | Health check for the voice engine |
| check_whisper_service | Health check for the Speech Pro engine |
Pronunciation Scoring
Returns scores (0-100) at four granularity levels:
| Level | Description |
|---|---|
| Overall | Global pronunciation quality |
| Sentence | Sentence-level fluency and accuracy |
| Word | Per-word pronunciation scores |
| Phoneme | Individual sound accuracy (IPA + ARPAbet) |
Performance
- Accuracy: Exceeds human inter-annotator agreement (PCC 0.590 vs 0.555)
- Validated: 9,259 utterances across 7 L1 backgrounds, zero errors
- Latency: Sub-second p50 for pronunciation and transcription
How to Use
Connect to this Actor in MCP server (standby) mode and call the tools above. All tools accept and return base64-encoded audio.
Example: Pronunciation Assessment
{"audio_base64": "<base64-encoded-audio>","text": "The quick brown fox jumps over the lazy dog"}
Example: Text-to-Speech
{"text": "Hello, how are you today?","voice": "af_heart","speed": 1.0}
Pricing
$0.02 per tool call (pay-per-event).
Technical Details
- Engines: Brainiall Pronunciation, Brainiall Speech, Brainiall Speech Pro, Brainiall Voice
- Audio: Supports WAV, MP3, OGG, FLAC, WebM
- Backend: Brainiall production API (https://api.brainiall.com), auto-scaling
More from Brainiall
This MCP server covers the speech tools. The full Brainiall platform offers 19 specialty AI APIs + 5 bundles (NLP, vision, documents, fraud, authenticity and more) under one key.