OpenAI TTS
OpenAI · Ranked #4 of 7 in Text-to-Speech APIs
Voice as a feature of the OpenAI platform, dead-simple endpoint, ubiquitous SDKs, but thin dedicated voice tooling (no custom voices, no SLA on free tier).
TTS bundled into the OpenAI API

Overview
OpenAI's Text-to-Speech API is a developer-facing speech-synthesis offering delivered through the same `/v1/audio/speech` endpoint and SDK ecosystem as the rest of the OpenAI platform. It spans three model generations: the original tts-1 (latency-optimized) and tts-1-hd (quality-optimized), both priced per character, and the March-2025 gpt-4o-mini-tts, a token-priced, "steerable" model that lets developers prompt not just what is said but how, controlling accent, emotion, tone, pacing and whispering via a free-text instructions field. It supports 13 built-in voices (with marin and cedar billed as the highest quality), outputs MP3/Opus/AAC/FLAC/WAV/PCM, streams audio via chunked transfer, and follows Whisper's broad multilingual coverage (90+ languages). The headline pitch is integration speed: if you are already on OpenAI APIs, TTS is a drop-in REST call with predictable pricing.
The product's center of gravity is convenience rather than best-in-class voice fidelity. Independent and community comparisons consistently frame OpenAI TTS as "good enough" voice quality that ships fast, versus ElevenLabs' superior emotional range and voice cloning. OpenAI reports its newest snapshot delivers roughly 35% lower word error rate on Common Voice and FLEURS, and one third-party preference test put OpenAI TTS on top at a 42.93% preference rate with 87.13% pronunciation accuracy. Where it loses is real-time latency and voice-cloning depth: benchmarks show tts-1-hd P50 latency over one second (not viable for live voice agents), and time-to-first-audio that trails ElevenLabs and Cartesia. Custom/cloned voices exist but are gated to eligible organizations with a consent-recording requirement.
For teams building voice agents at the cutting edge of latency, OpenAI now steers them toward the Realtime API and its newer realtime TTS models rather than the classic speech endpoint. The classic TTS API is best understood as a pragmatic batch/near-real-time synthesis tool: simple, cheap, well-documented, multilingual, and tightly integrated with the broader OpenAI stack, at the cost of the per-request 4,096-character limit (a recurring forum complaint), no fine-grained SLA on the Standard tier, and voices that sound competent but less expressive than dedicated voice specialists. Reliability and SLAs (99.9% uptime) are only contractually guaranteed on Scale/Priority tiers, not the default Standard tier.
How this score is derived
The APIbenchmarks Index is a weighted sum of four dimensions, each scored on an absolute 0–100 reference scale. See the methodology for every mapping.
| Dimension | Score | Weight | Contribution |
|---|---|---|---|
| Documentation & DXClear, example-rich official docs at developers.openai.com cover models, voices, formats, streaming and the instructions field, with copy-paste Python/Node/cURL snippets. | 85 | 30% | 25.5 |
| ReliabilityA central status page and historical uptime exist, but a contractual 99.9% uptime SLA applies only to Scale Tier/Priority Processing, the default Standard tier has no guaranteed latency or uptime. | 80 | 25% | 20.0 |
| Ecosystem & SDKsBacked by the full OpenAI SDK family plus distribution through Azure OpenAI, and integrations across frameworks (LangChain, Mastra, etc.), so it slots into existing OpenAI stacks with near-zero friction. | 86 | 25% | 21.5 |
| AccessibilityREST endpoint plus official Python and JavaScript/Node SDKs make it one of the easiest TTS APIs to adopt, though custom voices are restricted to eligible organizations. | 78 | 20% | 15.6 |
| APIbenchmarks Index (ABI) | 82.6 | ||
Table 1. Derivation of the ABI for OpenAI TTS. Contribution = score × weight; the index is their sum.
At a glance
- Vendor
- OpenAI
- Pricing model
- Per 1M characters / tokens
- Free tier
- No
- Official SDKs
- 5 languages
Pricing
| tts-1 (standard) | $15 / 1M characters | Latency-optimized model; ~$0.015 per 1K characters. Per-character billing. |
| tts-1-hd | $30 / 1M characters | Higher-fidelity, quality-optimized model; ~$0.030 per 1K characters. Per-character billing. |
| gpt-4o-mini-tts | $0.60 / 1M text input tokens + $12 / 1M audio output tokens | Token-based pricing; OpenAI estimates ~$0.015 per minute of generated audio. Newest, steerable model. |
Key features
- •Three models: tts-1 (low latency), tts-1-hd (high fidelity), gpt-4o-mini-tts (steerable)
- •13 built-in voices including alloy, echo, fable, nova, shimmer, marin, cedar
- •Instructions/steerability field to control accent, emotion, tone, speed, whispering (gpt-4o-mini-tts)
- •Output formats: MP3, Opus, AAC, FLAC, WAV, PCM
- •Real-time audio streaming via chunked transfer encoding
- •90+ language support following the Whisper model
- •Custom Voices for eligible orgs (consent recording + 30s sample)
- •OpenAI.fm interactive demo/playground for prototyping voices
- •Available via Azure OpenAI in addition to OpenAI's direct API
Official SDKs
Strengths & trade-offs
- +Dead-simple REST endpoint that mirrors OpenAI's other APIs, integration takes hours, not weeks, for teams already on the stack
- +Predictable, transparent per-character (tts-1/hd) pricing versus competitors' opaque credit systems
- +gpt-4o-mini-tts is steerable: prompt accent, emotion, tone, pacing and whispering via a free-text instructions field
- +Broad multilingual coverage (90+ languages, following Whisper) and 13 built-in voices
- +Multiple output formats (MP3, Opus, AAC, FLAC, WAV, PCM) plus chunked real-time streaming
- +Newest snapshot reports ~35% lower word error rate on Common Voice and FLEURS
- –4,096-character per-request limit on tts-1/hd is the #1 forum complaint, forcing manual chunking for long text
- –High real-time latency: tts-1-hd P50 exceeds 1s and TTFA trails ElevenLabs/Cartesia, making it weak for live voice agents
- –Voice quality and emotional range lag dedicated specialists like ElevenLabs; voices sound competent but less expressive
- –No guaranteed uptime/latency SLA on the default Standard tier, 99.9% SLA only on Scale/Priority tiers
- –Custom/cloned voices gated to eligible organizations with consent-recording requirements
- –Developers report the speed parameter being ignored and recent regressions adding unnatural per-word pauses
What developers say
Developers praise OpenAI TTS for ease of integration and price clarity while consistently noting it trades away voice expressiveness, real-time latency, and long-text handling versus dedicated voice specialists.
“OpenAI gives you dead-simple REST endpoints that work exactly like their other APIs. OpenAI TTS integration takes hours to days versus ElevenLabs taking days to weeks.”
Key figures
| Price (tts-1 standard) | $15 / 1M characters | OpenAI / pricing summaries ↗ |
| Price (tts-1-hd) | $30 / 1M characters | OpenAI / pricing summaries ↗ |
| Price (gpt-4o-mini-tts audio output) | $12 / 1M audio output tokens (~$0.015/min) | OpenAI next-gen audio models announcement ↗ |
| Word error rate improvement (latest snapshot) | ~35% lower WER on Common Voice and FLEURS | OpenAI / developer audio updates ↗ |
| Time to first audio (TTFA) | ~200ms (vs ElevenLabs ~150ms) | Cartesia comparison benchmark ↗ |
| Realtime TTS Arena ELO (OpenAI Realtime TTS 1) | 1,106 ELO | Artificial Analysis Realtime TTS Arena ↗ |
| Scale Tier uptime SLA | 99.9% (Scale/Priority tiers only) | OpenAI Scale Tier page ↗ |
Compare OpenAI TTS head to head
Sources
- https://developers.openai.com/api/docs/guides/text-to-speech
- https://openai.com/index/introducing-our-next-generation-audio-models/
- https://platform.openai.com/docs/models/tts-1
- https://platform.openai.com/docs/models/gpt-4o-mini-tts
- https://amitkoth.com/elevenlabs-vs-openai-tts/
- https://community.openai.com/t/gpt-4o-mini-tts-speed-and-unnatural-voice/1371831
- https://www.cartesia.ai/vs/elevenlabs-vs-openai-tts
- https://openai.com/api-scale-tier/
- https://status.openai.com/
Figures last verified 2026-06-27. Spotted an error? corrections@apibenchmarks.com
