Speech AI MCP Server avatar

Speech AI MCP Server

Pricing

Pay per usage

Go to Apify Store
Speech AI MCP Server

Speech AI MCP Server

AI-powered speech tools for MCP agents: pronunciation scoring (0-100 at phoneme/word/sentence level), speech-to-text with word timestamps, and text-to-speech with 12 English voices. Sub-300ms latency.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Fabio Suizu

Fabio Suizu

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 hours ago

Last modified

Categories

Share

Pronunciation Assessment MCP Server

AI-powered English pronunciation scoring for MCP-enabled AI agents.

What it does

This MCP server provides pronunciation assessment tools that can be called by AI agents:

  • assess_pronunciation: Score pronunciation from audio (WAV/MP3/OGG/WebM, base64-encoded)
  • check_pronunciation_service: Health check for the backend service

Scoring Levels

Returns scores (0-100) at four granularity levels:

LevelDescriptionExample Use
OverallGlobal pronunciation qualityQuick assessment
SentenceSentence-level fluency & accuracyFeedback on flow
WordPer-word pronunciation scoresIdentify problem words
PhonemeIndividual sound accuracyDetailed correction

Performance

  • Accuracy: Exceeds human inter-annotator agreement (PCC 0.576 vs 0.555 on phoneme scoring)
  • Validated: 9,259 utterances across 7 native language backgrounds, zero errors
  • Latency: p50=257ms, p95=423ms

How to Use

With MCP Client

Connect to the MCP endpoint:

https://Ym2gS88TksnTdTcPq.apify.actor/mcp?token=YOUR_APIFY_TOKEN

Tool: assess_pronunciation

Input:

{
"audio_base64": "<base64-encoded-audio>",
"text": "The quick brown fox jumps over the lazy dog"
}

Output:

{
"overall_score": 72.5,
"sentence_score": 75.0,
"words": [
{"word": "The", "score": 85.0, "phonemes": [...]},
...
]
}

Tool: check_pronunciation_service

Returns service health status, model version, and size.

Pricing

$0.02 per assessment (pay-per-event).

Technical Details

  • Model: Conformer-CTC Small (17MB, INT8 quantized)
  • Audio: 16kHz mono, supports WAV/MP3/OGG/WebM/M4A
  • Backend: Azure Container Apps, auto-scaling