Deepgram

Deepgram · Ranked #1 of 8 in Speech-to-Text APIs

87.5/ 100

AExcellent

Voice-AI specialist with the Nova-3 and Flux models, known for sub-300ms streaming latency and a developer-first console.

Best for

Real-time streaming STT for voice agents

Visit website Documentation

Overview

Deepgram is a developer-first speech AI platform whose core product is a low-latency, high-throughput speech-to-text (STT) API. Its flagship Nova-3 model targets real-time, production-scale voice applications, contact centers, voice agents, meeting/transcription tooling, and media workflows, where streaming latency and cost-per-hour matter as much as raw accuracy. Deepgram differentiates on speed (sub-300ms streaming latency), aggressive per-minute pricing, and a clean API surface with official SDKs across the major backend languages. Beyond STT it now ships a Text-to-Speech line (Aura-1/Aura-2) and a Voice Agent API, positioning itself as a full voice stack rather than a transcription-only vendor.

Deepgram's strongest fit is real-time, high-volume English (and increasingly multilingual) workloads where its streaming latency and price-per-hour beat incumbents like Google, AWS Transcribe, and Whisper-based services. Its own benchmarks claim a 5.26% batch WER and a 54.2% streaming WER reduction versus the next-best competitor, but those are vendor-run on Deepgram-selected datasets; independent third-party measurement (Artificial Analysis) puts real-world Nova-3 WER closer to ~18%, so buyers should validate on their own audio. Nova-3 added keyterm prompting (up to 100 terms / 500 multilingual tokens) to recover domain accuracy, plus real-time multilingual code-switching and 30+ language support.

The main trade-offs are accuracy on hard inputs (strong accents, overlapping speakers, less-common languages), cost at very large volumes, and some model-maturity gripes (users report Nova-2 occasionally outperforming Nova-3, and feature gaps between model generations). For teams that need maximum accuracy on noisy multi-speaker audio over raw speed, AssemblyAI or Whisper-class models are common alternatives. But for latency-sensitive voice agents and streaming transcription at scale, Deepgram remains a top, frequently top-rated, choice on G2 (4.6/5).

How this score is derived

The APIbenchmarks Index is a weighted sum of four dimensions, each scored on an absolute 0–100 reference scale. See the methodology for every mapping.

Dimension	Score	Weight	Contribution
Documentation & DXExtensive, developer-friendly docs at developers.deepgram.com with quickstarts, an API playground, a CLI, and an SDK feature matrix, consistently praised in reviews for clarity.	90	30%	27.0
ReliabilityPublic status page at status.deepgram.com and a 99.9% uptime SLA on Dedicated/enterprise tiers, with architecture proven to scale from 1,000 to 140,000 concurrent calls.	84	25%	21.0
Ecosystem & SDKsOfficial SDKs in Python, JavaScript/TypeScript, Go, .NET and Java (Rust community), plus AWS Marketplace listing, integrations, and a full voice stack (STT, TTS, Voice Agent).	82	25%	20.5
AccessibilitySelf-serve sign-up with no credit card and $200 free credit lowers the barrier, though deep multilingual coverage and self-hosted/Dedicated options sit behind enterprise plans.	95	20%	19.0
APIbenchmarks Index (ABI)			87.5

Table 1. Derivation of the ABI for Deepgram. Contribution = score × weight; the index is their sum.

At a glance

Vendor: Deepgram
Pricing model: Per minute (usage-based)
Free tier: $200 credit, no card
Official SDKs: 9 languages

Pricing

Pay As You Go	$0.0077/hr pre-recorded; $0.0048/min streaming (Nova-3 monolingual)	No minimums, no credit card; new users get $200 free credit. Multilingual Nova-3 is $0.0092/hr pre-recorded / $0.0058/min streaming.
Growth	$0.0065/hr pre-recorded; $0.0042/min streaming (Nova-3 monolingual)	Annual pre-paid credits save up to ~20%; higher concurrency limits. Commonly cited ~$4,000 annual minimum.
Enterprise	Custom	For large volumes, data residency / self-hosted (Dedicated single-tenant) deployments, 99.9% uptime SLA, and dedicated support.
Aura-2 Text-to-Speech	$0.030 / 1,000 chars (PAYG); $0.027 (Growth)	Aura-1 is $0.0150 / 1,000 chars (PAYG).
Voice Agent API	From $0.075/min (PAYG); $0.068/min (Growth)	Bundled STT+LLM+TTS; advanced tiers up to ~$0.163/min, with discounts for bring-your-own LLM/TTS.

Key features

•Nova-3 STT for pre-recorded (batch) and real-time streaming audio
•Speaker diarization (multi-speaker labeling)
•Smart Formatting (punctuation, casing, dates, currency) at no extra charge
•Keyterm prompting, up to 100 terms (500 multilingual tokens) for domain vocabulary
•Real-time multilingual transcription and code-switching, 30+ languages with auto language detection
•PII redaction / personal information removal
•Audio intelligence: sentiment analysis, entity extraction, summarization, topic detection
•Aura-1/Aura-2 Text-to-Speech
•Voice Agent API (bundled STT + LLM + TTS)
•Self-hosted / Dedicated single-tenant deployment for enterprise

Official SDKs

Python (pip install deepgram-sdk)JavaScript / TypeScript (@deepgram/sdk)Go (deepgram-go-sdk).NET / C# (Deepgram NuGet)Java (com.deepgram:deepgram-java-sdk)Rust (community SDK)CLI toolREST APIWebSocket streaming API

Strengths & trade-offs

Strengths

+Very low streaming latency (sub-300ms), making it well-suited to real-time voice agents and live captioning
+Among the cheapest per-hour STT pricing at scale ($0.0065-$0.0077/hr for Nova-3), with per-second billing (no minute rounding)
+Clean, developer-friendly API with official SDKs and consistently praised documentation
+Keyterm prompting (up to 100 terms) lets developers boost domain/brand vocabulary accuracy without retraining
+Self-serve onboarding: no credit card and $200 free credit to start
+Full voice stack (STT, Aura TTS, Voice Agent API) plus self-hosted/Dedicated enterprise deployment options

Trade-offs

–Real-world accuracy on third-party benchmarks (Artificial Analysis ~18% WER) is far worse than vendor-claimed 5.26%, so headline numbers overstate hard-audio performance
–Struggles relative to competitors on strong accents, overlapping speech, and noisy multi-speaker audio
–Language support is narrower/less accurate for less-common and non-English languages
–Cost can climb sharply at very high audio volumes despite low unit price
–Model-maturity gaps: users report Nova-2 sometimes outperforming Nova-3, and feature parity issues between model versions
–Higher concurrency requires moving off Pay-As-You-Go to Growth/Enterprise (with annual minimums)

What developers say

G2 4.6/5 (~325 reviews)

Developers broadly praise Deepgram for speed, real-time streaming, easy integration, and clear docs, while critiques center on accuracy with accents/noisy multi-speaker audio, limited non-English language support, and cost at high volume.

“Deepgram provides very accurate and fast speech-to-text transcription, even for long audio recordings and real-time streams; the API is easy to integrate and the documentation is clear and developer-friendly.”

Key figures

Batch WER (Nova-3, vendor test set)	5.26%	Deepgram (Introducing Nova-3) ↗
Streaming median WER (Nova-3, vendor test set)	6.84%	Deepgram (Introducing Nova-3) ↗
Real-world Word Error Rate (Nova-3, third-party)	~18%	Artificial Analysis Speech-to-Text Index ↗
Streaming WER reduction vs next-best competitor (vendor claim)	54.2%	Deepgram (Introducing Nova-3) ↗
Uptime SLA (Dedicated/Enterprise)	99.9%	Deepgram Dedicated ↗
Nova-3 monolingual price (pre-recorded, PAYG)	$0.0077/hr	Deepgram pricing page ↗
Concurrency scaling (contact center)	1,000 to 140,000 simultaneous calls	Deepgram ↗

Compare Deepgram head to head

Deepgram vs AssemblyAI Deepgram vs OpenAI Whisper / GPT-4o Transcribe Deepgram vs Google Cloud Speech-to-Text Deepgram vs Amazon Transcribe Deepgram vs Speechmatics Deepgram vs Gladia Deepgram vs Rev AI

Sources

Figures last verified 2026-06-27. Spotted an error? corrections@apibenchmarks.com