APIbenchmarks
Deepgram logo

Deepgram

Deepgram · Ranked #1 of 8 in Speech-to-Text APIs

87.5/ 100
AExcellent

Voice-AI specialist with the Nova-3 and Flux models, known for sub-300ms streaming latency and a developer-first console.

Best for

Real-time streaming STT for voice agents

Screenshot of Deepgram

Overview

Deepgram is a developer-first speech AI platform whose core product is a low-latency, high-throughput speech-to-text (STT) API. Its flagship Nova-3 model targets real-time, production-scale voice applications, contact centers, voice agents, meeting/transcription tooling, and media workflows, where streaming latency and cost-per-hour matter as much as raw accuracy. Deepgram differentiates on speed (sub-300ms streaming latency), aggressive per-minute pricing, and a clean API surface with official SDKs across the major backend languages. Beyond STT it now ships a Text-to-Speech line (Aura-1/Aura-2) and a Voice Agent API, positioning itself as a full voice stack rather than a transcription-only vendor.

Deepgram's strongest fit is real-time, high-volume English (and increasingly multilingual) workloads where its streaming latency and price-per-hour beat incumbents like Google, AWS Transcribe, and Whisper-based services. Its own benchmarks claim a 5.26% batch WER and a 54.2% streaming WER reduction versus the next-best competitor, but those are vendor-run on Deepgram-selected datasets; independent third-party measurement (Artificial Analysis) puts real-world Nova-3 WER closer to ~18%, so buyers should validate on their own audio. Nova-3 added keyterm prompting (up to 100 terms / 500 multilingual tokens) to recover domain accuracy, plus real-time multilingual code-switching and 30+ language support.

The main trade-offs are accuracy on hard inputs (strong accents, overlapping speakers, less-common languages), cost at very large volumes, and some model-maturity gripes (users report Nova-2 occasionally outperforming Nova-3, and feature gaps between model generations). For teams that need maximum accuracy on noisy multi-speaker audio over raw speed, AssemblyAI or Whisper-class models are common alternatives. But for latency-sensitive voice agents and streaming transcription at scale, Deepgram remains a top, frequently top-rated, choice on G2 (4.6/5).

How this score is derived

The APIbenchmarks Index is a weighted sum of four dimensions, each scored on an absolute 0–100 reference scale. See the methodology for every mapping.

DimensionScoreWeightContribution
Documentation & DXExtensive, developer-friendly docs at developers.deepgram.com with quickstarts, an API playground, a CLI, and an SDK feature matrix, consistently praised in reviews for clarity.
90
30%27.0
ReliabilityPublic status page at status.deepgram.com and a 99.9% uptime SLA on Dedicated/enterprise tiers, with architecture proven to scale from 1,000 to 140,000 concurrent calls.
84
25%21.0
Ecosystem & SDKsOfficial SDKs in Python, JavaScript/TypeScript, Go, .NET and Java (Rust community), plus AWS Marketplace listing, integrations, and a full voice stack (STT, TTS, Voice Agent).
82
25%20.5
AccessibilitySelf-serve sign-up with no credit card and $200 free credit lowers the barrier, though deep multilingual coverage and self-hosted/Dedicated options sit behind enterprise plans.
95
20%19.0
APIbenchmarks Index (ABI)87.5

Table 1. Derivation of the ABI for Deepgram. Contribution = score × weight; the index is their sum.

At a glance

Vendor
Deepgram
Pricing model
Per minute (usage-based)
Free tier
$200 credit, no card
Official SDKs
9 languages

Pricing

Pay As You Go$0.0077/hr pre-recorded; $0.0048/min streaming (Nova-3 monolingual)No minimums, no credit card; new users get $200 free credit. Multilingual Nova-3 is $0.0092/hr pre-recorded / $0.0058/min streaming.
Growth$0.0065/hr pre-recorded; $0.0042/min streaming (Nova-3 monolingual)Annual pre-paid credits save up to ~20%; higher concurrency limits. Commonly cited ~$4,000 annual minimum.
EnterpriseCustomFor large volumes, data residency / self-hosted (Dedicated single-tenant) deployments, 99.9% uptime SLA, and dedicated support.
Aura-2 Text-to-Speech$0.030 / 1,000 chars (PAYG); $0.027 (Growth)Aura-1 is $0.0150 / 1,000 chars (PAYG).
Voice Agent APIFrom $0.075/min (PAYG); $0.068/min (Growth)Bundled STT+LLM+TTS; advanced tiers up to ~$0.163/min, with discounts for bring-your-own LLM/TTS.

Key features

  • Nova-3 STT for pre-recorded (batch) and real-time streaming audio
  • Speaker diarization (multi-speaker labeling)
  • Smart Formatting (punctuation, casing, dates, currency) at no extra charge
  • Keyterm prompting, up to 100 terms (500 multilingual tokens) for domain vocabulary
  • Real-time multilingual transcription and code-switching, 30+ languages with auto language detection
  • PII redaction / personal information removal
  • Audio intelligence: sentiment analysis, entity extraction, summarization, topic detection
  • Aura-1/Aura-2 Text-to-Speech
  • Voice Agent API (bundled STT + LLM + TTS)
  • Self-hosted / Dedicated single-tenant deployment for enterprise

Official SDKs

Python (pip install deepgram-sdk)JavaScript / TypeScript (@deepgram/sdk)Go (deepgram-go-sdk).NET / C# (Deepgram NuGet)Java (com.deepgram:deepgram-java-sdk)Rust (community SDK)CLI toolREST APIWebSocket streaming API

Strengths & trade-offs

Strengths
  • +Very low streaming latency (sub-300ms), making it well-suited to real-time voice agents and live captioning
  • +Among the cheapest per-hour STT pricing at scale ($0.0065-$0.0077/hr for Nova-3), with per-second billing (no minute rounding)
  • +Clean, developer-friendly API with official SDKs and consistently praised documentation
  • +Keyterm prompting (up to 100 terms) lets developers boost domain/brand vocabulary accuracy without retraining
  • +Self-serve onboarding: no credit card and $200 free credit to start
  • +Full voice stack (STT, Aura TTS, Voice Agent API) plus self-hosted/Dedicated enterprise deployment options
Trade-offs
  • Real-world accuracy on third-party benchmarks (Artificial Analysis ~18% WER) is far worse than vendor-claimed 5.26%, so headline numbers overstate hard-audio performance
  • Struggles relative to competitors on strong accents, overlapping speech, and noisy multi-speaker audio
  • Language support is narrower/less accurate for less-common and non-English languages
  • Cost can climb sharply at very high audio volumes despite low unit price
  • Model-maturity gaps: users report Nova-2 sometimes outperforming Nova-3, and feature parity issues between model versions
  • Higher concurrency requires moving off Pay-As-You-Go to Growth/Enterprise (with annual minimums)

What developers say

G2 4.6/5 (~325 reviews)

Developers broadly praise Deepgram for speed, real-time streaming, easy integration, and clear docs, while critiques center on accuracy with accents/noisy multi-speaker audio, limited non-English language support, and cost at high volume.

Deepgram provides very accurate and fast speech-to-text transcription, even for long audio recordings and real-time streams; the API is easy to integrate and the documentation is clear and developer-friendly.

Key figures

Batch WER (Nova-3, vendor test set)5.26%Deepgram (Introducing Nova-3)
Streaming median WER (Nova-3, vendor test set)6.84%Deepgram (Introducing Nova-3)
Real-world Word Error Rate (Nova-3, third-party)~18%Artificial Analysis Speech-to-Text Index
Streaming WER reduction vs next-best competitor (vendor claim)54.2%Deepgram (Introducing Nova-3)
Uptime SLA (Dedicated/Enterprise)99.9%Deepgram Dedicated
Nova-3 monolingual price (pre-recorded, PAYG)$0.0077/hrDeepgram pricing page
Concurrency scaling (contact center)1,000 to 140,000 simultaneous callsDeepgram

Compare Deepgram head to head

Sources

  1. https://deepgram.com/pricing
  2. https://deepgram.com/learn/introducing-nova-3-speech-to-text-api
  3. https://developers.deepgram.com/sdks/sdk-features
  4. https://artificialanalysis.ai/speech-to-text/models/deepgram
  5. https://www.g2.com/products/deepgram/reviews
  6. https://status.deepgram.com/
  7. https://deepgram.com/dedicated
  8. https://developers.deepgram.com/docs/keyterm
  9. https://diyai.io/ai-tools/speech-to-text/reviews/deepgram-ai-review/

Figures last verified 2026-06-27. Spotted an error? corrections@apibenchmarks.com