APIbenchmarks
Gladia logo

Gladia

Gladia · Ranked #7 of 8 in Speech-to-Text APIs

74.0/ 100
CSolid

European audio-infrastructure challenger wrapping Whisper-grade accuracy with all features (diarization, translation, code-switching) included at every tier.

Best for

All-inclusive STT for voice products

Screenshot of Gladia

Overview

Gladia is a Paris-based audio-AI company (founded 2022) offering a speech-to-text API positioned as production infrastructure for voice products: meeting assistants, contact centers, voice agents, and call analytics. Its core differentiator is multilingual breadth and real-world robustness. The current flagship models, Solaria-1 and the newer Solaria-3, support 100+ languages with native code-switching, automatic language detection, speaker diarization, translation, sentiment analysis, and PII redaction bundled into the base price rather than charged as separate add-ons. The company raised a $16M (€14.5M) Series A in October 2024 led by XAnge, explicitly to push into real-time/streaming transcription, which it frames as the next frontier beyond Whisper-style batch transcription.

On accuracy, Gladia publishes third-party-dataset benchmarks where Solaria-3 leads: 6.4% WER on Earnings22 business audio (ahead of AssemblyAI 6.9%, ElevenLabs 7.7%, Speechmatics 7.8%, Deepgram 12.0%) and the only model under 35% WER on the noisy Switchboard telephone set. For real-time, it advertises ~270ms average partial latency and sub-300ms finals. These are vendor-run benchmarks, so they should be read as favorable-but-credible rather than independent. Pricing is aggressive and transparent: a genuinely usable 10-hour/month free tier, pay-as-you-go Starter at $0.61/hr async and $0.75/hr real-time, and committed Growth pricing dropping to roughly $0.20/hr async, undercutting Deepgram, AssemblyAI, and Speechmatics on list price while including audio-intelligence features.

Gladia's sweet spot is European/multilingual, telephony-heavy, and EU-compliance-sensitive workloads (GDPR/HIPAA/SOC 2 Type 2, zero-data-retention and EU hosting on Enterprise). The main tradeoffs are the lighter ecosystem versus US incumbents (fewer official SDKs, a smaller community, less third-party review volume), some real-time tuning rough edges (e.g. a 5-second minimum endpointing window that frustrates dictation use cases), and the fact that most published accuracy claims originate from Gladia itself. For teams that need many languages, clean diarization, and competitive per-hour cost without paying à la carte for features, it's a strong contender; teams wanting the deepest tooling/community or fully independent benchmarks may still default to Deepgram or AssemblyAI.

How this score is derived

The APIbenchmarks Index is a weighted sum of four dimensions, each scored on an absolute 0–100 reference scale. See the methodology for every mapping.

DimensionScoreWeightContribution
Documentation & DXSolid developer docs at docs.gladia.io with REST/WebSocket references plus a large library of step-by-step integration blog guides (Node.js websockets, async SDK, meeting bots), though less encyclopedic than Deepgram/AssemblyAI.
78
30%23.4
ReliabilityMarkets GDPR/HIPAA/SOC 2 Type 2 compliance with SLAs, zero data retention, and dedicated infrastructure on Enterprise, but publishes no public status-page uptime percentage, so reliability claims rest on contractual SLAs rather than transparent metrics.
70
25%17.5
Ecosystem & SDKsAvailable on AWS Marketplace and via official Python and Node.js/JavaScript SDKs and a community Python client (GladiaPy), but the third-party integration and community footprint is smaller than US incumbents.
62
25%15.5
AccessibilityLow barrier to entry: self-serve signup, a 10-hour/month free tier, Stripe card billing, and pay-as-you-go pricing with no mandatory sales call until Enterprise.
88
20%17.6
APIbenchmarks Index (ABI)74.0

Table 1. Derivation of the ABI for Gladia. Contribution = score × weight; the index is their sum.

At a glance

Vendor
Gladia
Pricing model
Per hour (usage-based)
Free tier
10 hrs/mo
Official SDKs
6 languages

Pricing

Free$0 (10 hrs/month)Up to 10 hours of transcription per month free; access to core features to evaluate the API.
Starter (Pay-as-you-go)$0.61/hr async, $0.75/hr real-time30 concurrent real-time / 25 concurrent async requests; language detection, diarization, 100+ languages, GDPR/HIPAA/SOC 2 Type 2.
Growth (Commitment)from $0.20/hr async, $0.25/hr real-timeEverything in Starter plus volume-based discounts (up to ~67% off Starter), flexible concurrency, model-training opt-out.
EnterpriseCustom (annual)Unlimited concurrency, custom/dedicated models, custom hosting, zero data retention, SLAs, dedicated Slack + account manager, premium support.

Key features

  • Solaria-1 / Solaria-3 multilingual STT models supporting 100+ languages
  • Asynchronous (pre-recorded file) transcription via REST upload
  • Real-time streaming transcription over WebSocket with VAD
  • Automatic language detection and code-switching mid-utterance
  • Speaker diarization with speaker-level confidence and word-level timestamps
  • Translation into 100+ target languages
  • Sentiment analysis (up to ~25 emotions) and audio intelligence
  • PII redaction for GDPR-sensitive workflows
  • Custom vocabulary / domain terminology support
  • GDPR, HIPAA, and SOC 2 Type 2 compliance with zero-data-retention options

Official SDKs

Python (official SDK)Node.js / JavaScript (official SDK)WebSocket streaming reference clientsREST API (language-agnostic)GladiaPy (community Python client)AWS Marketplace listing

Strengths & trade-offs

Strengths
  • +Audio-intelligence features (diarization, translation, sentiment, PII redaction, language detection) are bundled into the base per-hour price rather than billed as separate add-ons
  • +Strong, third-party-dataset accuracy on Solaria-3: 6.4% WER on Earnings22 business audio, beating AssemblyAI, ElevenLabs, Speechmatics, and Deepgram
  • +Very competitive pricing with a real 10-hour/month free tier and committed rates as low as ~$0.20/hr async
  • +Genuinely broad multilingual coverage (100+ languages) with native code-switching, a notable edge for European and mixed-language audio
  • +Low real-time latency (~270ms average partials, sub-300ms finals) suitable for live agents and contact centers
  • +EU-based with GDPR/HIPAA/SOC 2 Type 2 compliance, EU hosting and zero-data-retention options for privacy-sensitive workloads
Trade-offs
  • Most published accuracy/latency benchmarks are vendor-run by Gladia itself, with limited fully independent third-party validation
  • Smaller ecosystem and community than US incumbents (fewer official SDKs, less third-party tooling and review volume)
  • No public status-page uptime figures; reliability rests on contractual SLAs rather than transparent metrics
  • Real-time tuning rough edges, e.g. a 5-second minimum endpointing window that makes live dictation feel slow
  • Enterprise/high-volume pricing requires a custom quote and is less transparent than the self-serve tiers
  • Younger company (founded 2022, single Series A) versus more established speech-to-text vendors

What developers say

G2 4.8/5 (23 reviews)

Users consistently praise Gladia for high accuracy, low latency, and strong multilingual handling of messy real-world audio, with occasional gripes about pricing clarity at scale and real-time endpointing behavior.

Users consistently praise Gladia for its high accuracy and fast transcriptions, making it a reliable choice for speech-to-text applications.

Key figures

WER, Earnings22 (business audio)6.4% (1st vs AssemblyAI 6.9%, ElevenLabs 7.7%, Speechmatics 7.8%, Deepgram 12.0%)Gladia Solaria-3 announcement
WER, Switchboard (telephone/conversational)33.9% (only model tested under 35%)Gladia Solaria-3 announcement
WER, noisy audio1.4% (vs AssemblyAI 2.1%, Deepgram 3.2%, ElevenLabs 4.0%)Gladia Solaria-3 announcement
WER, internal English customer-call dataset9.6% (26% better than Solaria-1's 12.9%)Gladia Solaria-3 announcement
Real-time latency (avg partial)~270 ms average (103 ms optimal partial, 698 ms final)Gladia Solaria-1 announcement
Async price$0.61/hr Starter, from $0.20/hr GrowthGladia pricing page
Real-time price$0.75/hr Starter, from $0.25/hr GrowthGladia pricing page

Compare Gladia head to head

Sources

  1. https://www.gladia.io/pricing
  2. https://www.gladia.io/blog/solaria-3-speech-to-text-model-for-european-languages
  3. https://www.gladia.io/blog/introducing-solaria-the-first-truly-universal-speech-to-text-model
  4. https://www.g2.com/products/gladia/reviews
  5. https://techcrunch.com/2024/10/15/gladia-believes-real-time-processing-is-the-next-frontier-of-audio-transcription-apis/
  6. https://sifted.eu/articles/gladia-raise-ai-france-news
  7. https://www.gladia.io/blog/measuring-latency-in-stt
  8. https://aws.amazon.com/marketplace/reviews/reviews-list/prodview-hrjyzqt2qpexe
  9. https://www.capterra.com/p/10019495/Gladia/

Figures last verified 2026-06-27. Spotted an error? corrections@apibenchmarks.com