Speechmatics

Speechmatics · Ranked #6 of 8 in Speech-to-Text APIs

76.6/ 100

BStrong

UK-based accuracy and multilingual specialist (55+ languages, strong accent coverage) with batch, real-time, and on-prem deployment options.

Best for

Multilingual accuracy, on-prem capable

Visit website Documentation

Overview

Speechmatics is a UK-based (Cambridge) speech-to-text vendor that positions itself at the accuracy-and-language-coverage end of the ASR market. Its core proposition is "Ursa"-generation models that transcribe 55+ languages with strong robustness to accents, dialects and noisy audio, delivered through both batch and real-time (WebSocket) APIs as well as on-premises/container deployments for enterprises with data-residency or air-gap requirements. Beyond raw transcription, the platform bundles speaker diarization (real-time and batch), translation into 30+ languages from a single call, custom dictionary/vocabulary, and more recently a Flow voice-agent stack plus text-to-speech, broadening it from a pure ASR API into a fuller voice-AI toolkit.

The provider's strongest selling point is accuracy across the long tail of languages and accents. Speechmatics' own published benchmark cites a 1.07% English WER versus 1.62% for Deepgram, and it claims to be the most accurate vendor in 93.7% of language comparisons; while these figures are vendor-sourced and should be read with that caveat, third-party leaderboards (Hugging Face Open ASR) do place its Ursa model among the top tier alongside Whisper Large-v3, Deepgram Nova and AssemblyAI Universal. It is also recognized on Daily/Pipecat's latency "Pareto frontier" for real-time voice agents. The trade-off is price and breadth: reviewers repeatedly flag it as expensive relative to commodity APIs, and despite 55+ languages some users note gaps (e.g. historically weak Arabic support) and that it lacks some of the analytics/summarization conveniences bundled by competitors like AssemblyAI.

Who it's for: enterprises and product teams that need defensible accuracy across many languages/accents, real-time captioning, or on-prem deployment, and are willing to pay a premium and do more integration work than a turnkey transcription SaaS would require. It is less compelling for hobbyists or cost-sensitive teams who can tolerate open-source Whisper or cheaper commodity APIs. Documentation and SDK quality are solid (typed, modular Python SDKs; clear API reference), and G2 sentiment is unusually high (4.8/5), but the small public review volume (~40 G2 reviews, 2 on Gartner) means the aggregate signal is thinner than for larger cloud vendors.

How this score is derived

The APIbenchmarks Index is a weighted sum of four dimensions, each scored on an absolute 0–100 reference scale. See the methodology for every mapping.

Dimension	Score	Weight	Contribution
Documentation & DXComprehensive developer docs at docs.speechmatics.com plus a dedicated SDK reference site, covering batch, real-time WebSocket, diarization, translation and voice-agent flows with code samples.	76	30%	22.8
ReliabilityA public status page (status.speechmatics.com) monitors 13 components across 3 groups and reports systems operational, but no numeric uptime-percentage SLA is published openly for self-serve tiers.	80	25%	20.0
Ecosystem & SDKsModular official SDKs (speechmatics-batch, -rt, -voice, -tts) on PyPI/GitHub, a CLI, multi-region cloud plus on-prem containers, and a Flow voice-agent stack extend it beyond a single API.	68	25%	17.0
AccessibilityA genuinely usable free tier (3,000 STT minutes/month, 2 concurrent real-time sessions) and a startup credits program (up to $50k) lower the barrier, though headline per-hour pricing skews premium versus commodity APIs.	84	20%	16.8
APIbenchmarks Index (ABI)			76.6

Table 1. Derivation of the ABI for Speechmatics. Contribution = score × weight; the index is their sum.

At a glance

Vendor: Speechmatics
Pricing model: Per minute (usage-based)
Free tier: 8 hrs/mo (480 min)
Official SDKs: 5 languages

Pricing

Free	$0/mo	3,000 STT minutes/month (50 hrs) + 1M TTS characters/month, 2 concurrent real-time sessions, 56+ languages, multi-region cloud.
Pro (usage-based)	From ~$0.129/hour (batch from ~$0.0050/min, real-time from ~$0.0067/min)	Includes 3,000 free STT minutes/month then pay-as-you-go; up to 50 concurrent real-time sessions, 10 file jobs/sec, capped at 6,000 hours/month; billed to the second; email support.
Enterprise	Custom	No rate limits, all features incl. audio alignment, custom models/voices, SaaS or on-premises deployment, highest concurrency, dedicated support; volume discounts that scale.

Key features

•Batch (async file) speech-to-text API
•Real-time streaming transcription over WebSocket
•Speaker diarization (real-time and batch)
•Translation into 30+ languages from a single transcription call
•56+ language support with accent/dialect robustness
•Custom dictionary / custom vocabulary and custom model development (Enterprise)
•Text-to-Speech (natural AI voices)
•Flow voice-agent framework for building conversational voice apps
•On-premises / containerized deployment option
•Audio alignment (Enterprise)

Official SDKs

Python (speechmatics-python-sdk; modular speechmatics-batch, speechmatics-rt, speechmatics-voice, speechmatics-tts)Python CLIJavaScript / TypeScriptREST / HTTP batch APIReal-time WebSocket API

Strengths & trade-offs

Strengths

+Top-tier transcription accuracy, especially across diverse accents/dialects and non-English languages (vendor cites 1.07% English WER)
+Broad language coverage: 55+ languages for both batch and real-time, with translation into 30+ languages in a single call
+Strong real-time/streaming API recognized on the latency Pareto frontier for voice agents (Daily/Pipecat)
+Speaker diarization in both real-time and batch modes, with tunable sensitivity and max-speaker controls
+On-premises / container deployment available for data-residency and air-gapped use cases
+Usable free tier (3,000 min/month) plus startup credits up to $50,000

Trade-offs

–Premium pricing, reviewers repeatedly say cost limits accessibility versus commodity APIs and open-source Whisper
–Language gaps despite the 55+ count (users note weak/absent Arabic and uneven quality in some languages)
–Fewer bundled value-add analytics (e.g. summarization) than some competitors like AssemblyAI
–Pro tier has a hard 6,000-hours/month cap, pushing heavy users to custom Enterprise contracts
–Thin public review volume (~40 G2, 2 Gartner) makes the aggregate sentiment less statistically robust
–No openly published numeric uptime SLA for self-serve tiers

What developers say

G2 4.8/5 · ~40 reviews; Gartner Peer Insights 4/5 · 2 reviews

Sentiment is strongly positive on accuracy, speed and language coverage, with the main recurring criticisms being high price and gaps in some languages.

“Users consistently praise the product for its exceptional accuracy and fast performance... noting its performance even in challenging audio conditions and diverse accents.”

Key figures

English Word Error Rate (WER)	1.07% (vs Deepgram 1.62%)	Speechmatics published comparison ↗
Most-accurate-vendor rate across languages	93.73% of language comparisons	Speechmatics ↗
Batch transcription price	from ~$0.0050/min	Speechmatics pricing / G2 pricing ↗
Real-time transcription price	from ~$0.0067/min (~$0.129/hour)	Speechmatics pricing ↗
Free tier allowance	3,000 STT minutes/month + 1M TTS chars/month	Speechmatics pricing ↗
Pro tier concurrency / cap	50 real-time sessions; capped at 6,000 hrs/month	Speechmatics pricing ↗
Real-time latency positioning	On the latency 'Pareto frontier' for voice agents	Daily/Pipecat benchmark (cited by Speechmatics) ↗

Compare Speechmatics head to head

Speechmatics vs Deepgram Speechmatics vs AssemblyAI Speechmatics vs OpenAI Whisper / GPT-4o Transcribe Speechmatics vs Google Cloud Speech-to-Text Speechmatics vs Amazon Transcribe Speechmatics vs Gladia Speechmatics vs Rev AI

Sources

Figures last verified 2026-06-27. Spotted an error? corrections@apibenchmarks.com