Deepgram
Deepgram · Ranked #1 of 8 in Speech-to-Text APIs
Voice-AI specialist with the Nova-3 and Flux models, known for sub-300ms streaming latency and a developer-first console.
Real-time streaming STT for voice agents

Overview
Deepgram is a developer-first speech AI platform whose core product is a low-latency, high-throughput speech-to-text (STT) API. Its flagship Nova-3 model targets real-time, production-scale voice applications, contact centers, voice agents, meeting/transcription tooling, and media workflows, where streaming latency and cost-per-hour matter as much as raw accuracy. Deepgram differentiates on speed (sub-300ms streaming latency), aggressive per-minute pricing, and a clean API surface with official SDKs across the major backend languages. Beyond STT it now ships a Text-to-Speech line (Aura-1/Aura-2) and a Voice Agent API, positioning itself as a full voice stack rather than a transcription-only vendor.
Deepgram's strongest fit is real-time, high-volume English (and increasingly multilingual) workloads where its streaming latency and price-per-hour beat incumbents like Google, AWS Transcribe, and Whisper-based services. Its own benchmarks claim a 5.26% batch WER and a 54.2% streaming WER reduction versus the next-best competitor, but those are vendor-run on Deepgram-selected datasets; independent third-party measurement (Artificial Analysis) puts real-world Nova-3 WER closer to ~18%, so buyers should validate on their own audio. Nova-3 added keyterm prompting (up to 100 terms / 500 multilingual tokens) to recover domain accuracy, plus real-time multilingual code-switching and 30+ language support.
The main trade-offs are accuracy on hard inputs (strong accents, overlapping speakers, less-common languages), cost at very large volumes, and some model-maturity gripes (users report Nova-2 occasionally outperforming Nova-3, and feature gaps between model generations). For teams that need maximum accuracy on noisy multi-speaker audio over raw speed, AssemblyAI or Whisper-class models are common alternatives. But for latency-sensitive voice agents and streaming transcription at scale, Deepgram remains a top, frequently top-rated, choice on G2 (4.6/5).
How this score is derived
The APIbenchmarks Index is a weighted sum of four dimensions, each scored on an absolute 0–100 reference scale. See the methodology for every mapping.
| Dimension | Score | Weight | Contribution |
|---|---|---|---|
| Documentation & DXExtensive, developer-friendly docs at developers.deepgram.com with quickstarts, an API playground, a CLI, and an SDK feature matrix, consistently praised in reviews for clarity. | 90 | 30% | 27.0 |
| ReliabilityPublic status page at status.deepgram.com and a 99.9% uptime SLA on Dedicated/enterprise tiers, with architecture proven to scale from 1,000 to 140,000 concurrent calls. | 84 | 25% | 21.0 |
| Ecosystem & SDKsOfficial SDKs in Python, JavaScript/TypeScript, Go, .NET and Java (Rust community), plus AWS Marketplace listing, integrations, and a full voice stack (STT, TTS, Voice Agent). | 82 | 25% | 20.5 |
| AccessibilitySelf-serve sign-up with no credit card and $200 free credit lowers the barrier, though deep multilingual coverage and self-hosted/Dedicated options sit behind enterprise plans. | 95 | 20% | 19.0 |
| APIbenchmarks Index (ABI) | 87.5 | ||
Table 1. Derivation of the ABI for Deepgram. Contribution = score × weight; the index is their sum.
At a glance
- Vendor
- Deepgram
- Pricing model
- Per minute (usage-based)
- Free tier
- $200 credit, no card
- Official SDKs
- 9 languages
Pricing
| Pay As You Go | $0.0077/hr pre-recorded; $0.0048/min streaming (Nova-3 monolingual) | No minimums, no credit card; new users get $200 free credit. Multilingual Nova-3 is $0.0092/hr pre-recorded / $0.0058/min streaming. |
| Growth | $0.0065/hr pre-recorded; $0.0042/min streaming (Nova-3 monolingual) | Annual pre-paid credits save up to ~20%; higher concurrency limits. Commonly cited ~$4,000 annual minimum. |
| Enterprise | Custom | For large volumes, data residency / self-hosted (Dedicated single-tenant) deployments, 99.9% uptime SLA, and dedicated support. |
| Aura-2 Text-to-Speech | $0.030 / 1,000 chars (PAYG); $0.027 (Growth) | Aura-1 is $0.0150 / 1,000 chars (PAYG). |
| Voice Agent API | From $0.075/min (PAYG); $0.068/min (Growth) | Bundled STT+LLM+TTS; advanced tiers up to ~$0.163/min, with discounts for bring-your-own LLM/TTS. |
Key features
- •Nova-3 STT for pre-recorded (batch) and real-time streaming audio
- •Speaker diarization (multi-speaker labeling)
- •Smart Formatting (punctuation, casing, dates, currency) at no extra charge
- •Keyterm prompting, up to 100 terms (500 multilingual tokens) for domain vocabulary
- •Real-time multilingual transcription and code-switching, 30+ languages with auto language detection
- •PII redaction / personal information removal
- •Audio intelligence: sentiment analysis, entity extraction, summarization, topic detection
- •Aura-1/Aura-2 Text-to-Speech
- •Voice Agent API (bundled STT + LLM + TTS)
- •Self-hosted / Dedicated single-tenant deployment for enterprise
Official SDKs
Strengths & trade-offs
- +Very low streaming latency (sub-300ms), making it well-suited to real-time voice agents and live captioning
- +Among the cheapest per-hour STT pricing at scale ($0.0065-$0.0077/hr for Nova-3), with per-second billing (no minute rounding)
- +Clean, developer-friendly API with official SDKs and consistently praised documentation
- +Keyterm prompting (up to 100 terms) lets developers boost domain/brand vocabulary accuracy without retraining
- +Self-serve onboarding: no credit card and $200 free credit to start
- +Full voice stack (STT, Aura TTS, Voice Agent API) plus self-hosted/Dedicated enterprise deployment options
- –Real-world accuracy on third-party benchmarks (Artificial Analysis ~18% WER) is far worse than vendor-claimed 5.26%, so headline numbers overstate hard-audio performance
- –Struggles relative to competitors on strong accents, overlapping speech, and noisy multi-speaker audio
- –Language support is narrower/less accurate for less-common and non-English languages
- –Cost can climb sharply at very high audio volumes despite low unit price
- –Model-maturity gaps: users report Nova-2 sometimes outperforming Nova-3, and feature parity issues between model versions
- –Higher concurrency requires moving off Pay-As-You-Go to Growth/Enterprise (with annual minimums)
What developers say
G2 4.6/5 (~325 reviews)
Developers broadly praise Deepgram for speed, real-time streaming, easy integration, and clear docs, while critiques center on accuracy with accents/noisy multi-speaker audio, limited non-English language support, and cost at high volume.
“Deepgram provides very accurate and fast speech-to-text transcription, even for long audio recordings and real-time streams; the API is easy to integrate and the documentation is clear and developer-friendly.”
Key figures
| Batch WER (Nova-3, vendor test set) | 5.26% | Deepgram (Introducing Nova-3) ↗ |
| Streaming median WER (Nova-3, vendor test set) | 6.84% | Deepgram (Introducing Nova-3) ↗ |
| Real-world Word Error Rate (Nova-3, third-party) | ~18% | Artificial Analysis Speech-to-Text Index ↗ |
| Streaming WER reduction vs next-best competitor (vendor claim) | 54.2% | Deepgram (Introducing Nova-3) ↗ |
| Uptime SLA (Dedicated/Enterprise) | 99.9% | Deepgram Dedicated ↗ |
| Nova-3 monolingual price (pre-recorded, PAYG) | $0.0077/hr | Deepgram pricing page ↗ |
| Concurrency scaling (contact center) | 1,000 to 140,000 simultaneous calls | Deepgram ↗ |
Compare Deepgram head to head
Sources
- https://deepgram.com/pricing
- https://deepgram.com/learn/introducing-nova-3-speech-to-text-api
- https://developers.deepgram.com/sdks/sdk-features
- https://artificialanalysis.ai/speech-to-text/models/deepgram
- https://www.g2.com/products/deepgram/reviews
- https://status.deepgram.com/
- https://deepgram.com/dedicated
- https://developers.deepgram.com/docs/keyterm
- https://diyai.io/ai-tools/speech-to-text/reviews/deepgram-ai-review/
Figures last verified 2026-06-27. Spotted an error? corrections@apibenchmarks.com
