DeepSeek API
DeepSeek · Ranked #7 of 7 in LLM APIs
The price-leader for strong reasoning models, with OpenAI- and Anthropic-compatible endpoints and aggressive cache pricing.
Lowest-cost reasoning models

Overview
DeepSeek API is the first-party inference service from Chinese AI lab DeepSeek (deepseek.com), offering its open-weight frontier models through an OpenAI- and Anthropic-compatible HTTP interface. As of mid-2026 the current models are deepseek-v4-flash (cost-optimized, non-thinking and thinking modes) and deepseek-v4-pro (flagship reasoning), with the legacy deepseek-chat and deepseek-reasoner names kept as compatibility aliases routing to V4 Flash and scheduled for deprecation on 2026-07-24. The headline differentiator is price: V4 Flash runs about $0.14 per million input tokens (cache miss) and $0.28 output, with cache hits as low as $0.0028/M, roughly 90-95% cheaper than comparable OpenAI or Anthropic models, alongside a 1M-token context window and up to 384K output tokens. Because the API mirrors the OpenAI ChatCompletions schema, migration is effectively a two-line change and any OpenAI SDK works without a DeepSeek-specific package.
The product is aimed at cost-sensitive developers, RAG and agent builders, and teams that want frontier-ish reasoning and coding quality at commodity prices, plus those who value being able to self-host the same open weights as a fallback. On quality, DeepSeek V4 Pro lands near the top of open-weight reasoning models on the Artificial Analysis Intelligence Index (~44, #2 among open-weight reasoners behind Kimi K2.6) and posts strong coding/knowledge numbers (e.g. ~80.6% SWE-bench, 87-91% MMLU-Pro depending on mode), though some V4 figures are vendor-reported and await fuller third-party reproduction. Built-in automatic context caching (with explicit prompt_cache_hit/miss token accounting), JSON output, function/tool calling, and FIM completion round out a genuinely capable feature set.
The major weaknesses are operational and trust-related. DeepSeek uses dynamic, load-based rate limiting with no purchasable tier to raise it: under heavy platform load you get HTTP 429s that demand exponential backoff with jitter, and first-party throughput/latency on Artificial Analysis (DeepSeek-hosted V4 Flash ~106-124 t/s, with high time-to-first-token in reasoning mode) trails several third-party hosts like Makora, Together, and Azure. Reliability has been inconsistent, and the bigger blocker for many enterprises is data governance: data is processed on servers in China, which triggered regulatory bans (Italy's Garante, multiple government device bans) and persistent security/privacy criticism. For non-sensitive workloads where price-per-token dominates, it is compelling; for regulated or latency-critical production, many teams route the same open weights through a Western host instead.
How this score is derived
The APIbenchmarks Index is a weighted sum of four dimensions, each scored on an absolute 0–100 reference scale. See the methodology for every mapping.
| Dimension | Score | Weight | Contribution |
|---|---|---|---|
| Documentation & DXClean official docs at api-docs.deepseek.com cover quickstart, pricing, function calling, JSON mode, FIM, and context caching, with OpenAI/Anthropic-compatibility framing that makes onboarding fast. | 74 | 30% | 22.2 |
| ReliabilityDynamic load-based rate limiting (frequent 429s with no paid tier to raise limits) plus documented API-instability and partial-outage reports make first-party reliability a known weak spot. | 64 | 25% | 16.0 |
| Ecosystem & SDKsStrong reach via OpenAI/Anthropic API compatibility and open weights, so it is hosted by many third-party inference providers (Together, Fireworks, Azure, DeepInfra) and works with existing OpenAI SDKs and tooling. | 70 | 25% | 17.5 |
| AccessibilitySelf-serve signup, prepaid credits, and OpenAI-drop-in usage make it very approachable for developers, but data-residency-in-China and regulatory bans limit accessibility for regulated or geo-restricted organizations. | 85 | 20% | 17.0 |
| APIbenchmarks Index (ABI) | 72.7 | ||
Table 1. Derivation of the ABI for DeepSeek API. Contribution = score × weight; the index is their sum.
At a glance
- Vendor
- DeepSeek
- Pricing model
- Per token
- Free tier
- No
- Official SDKs
- 5 languages
Pricing
| deepseek-v4-flash (input, cache miss) | $0.14 / 1M tokens | Cost-optimized model; cache-hit input just $0.0028/1M. Legacy deepseek-chat/reasoner alias to this model. |
| deepseek-v4-flash (output) | $0.28 / 1M tokens | Output tokens for V4 Flash, thinking and non-thinking modes. |
| deepseek-v4-pro (input, cache miss) | $0.435 / 1M tokens | Flagship reasoning model; cache-hit input $0.003625/1M. |
| deepseek-v4-pro (output) | $0.87 / 1M tokens | Output tokens for V4 Pro. 1M-token context, up to 384K output. |
Key features
- •OpenAI ChatCompletions-compatible API
- •Anthropic Messages-format compatibility
- •Automatic on-disk context caching with prompt_cache_hit/miss token reporting
- •JSON output mode (OpenAI-compatible)
- •Function / tool calling in thinking and non-thinking modes
- •FIM (Fill-In-the-Middle) completion via beta base_url
- •Thinking (visible chain-of-thought) and non-thinking modes
- •1M-token context, up to 384K output tokens
- •Chat prefix completion
- •Streaming responses
Official SDKs
Strengths & trade-offs
- +Extremely low price per token, roughly 90-95% cheaper than comparable OpenAI/Anthropic models
- +Aggressive automatic context caching with cache-hit input as low as $0.0028/1M and explicit hit/miss token accounting
- +OpenAI- and Anthropic-API compatible, migrate with a two-line change using existing SDKs
- +1M-token context window with up to 384K output tokens
- +Open-weight models, so the same model can be self-hosted or run via many third-party providers as a fallback
- +Top-tier open-weight reasoning quality (V4 Pro ~#2 open-weight reasoner on Artificial Analysis Intelligence Index)
- –Dynamic, load-based rate limiting with no paid tier to raise limits; 429 errors spike under platform load
- –Reliability/instability issues and partial outages reported by third-party monitors
- –Data is processed on servers in China, triggering regulatory bans (Italy Garante) and enterprise data-governance concerns
- –Documented app-side security weaknesses (deprecated 3DES, hard-coded key) fuel trust concerns
- –First-party throughput and time-to-first-token trail several third-party hosts on Artificial Analysis
- –Some V4 benchmark figures are vendor-reported and not yet fully reproduced by third parties
What developers say
Developers praise DeepSeek's dramatic cost savings and open weights but repeatedly flag API instability/rate limits and serious data-privacy concerns tied to China-based processing.
“DeepSeek charges about 95 percent less for API access than OpenAI or Anthropic do for comparable models.”
Key figures
| Artificial Analysis Intelligence Index (V4 Pro, reasoning, max effort) | ~44 (#2 open-weight reasoner) | Artificial Analysis ↗ |
| Output speed, DeepSeek-hosted V4 Flash (reasoning) | 123.6 tokens/sec (P50) | Artificial Analysis ↗ |
| Output speed, DeepSeek-hosted V4 Flash (non-reasoning) | 106.9 tokens/sec (P50) | Artificial Analysis ↗ |
| Time to first token, DeepSeek-hosted V4 Flash (non-reasoning) | 1.37s (P50) | Artificial Analysis ↗ |
| Price, deepseek-v4-flash input (cache miss) / output | $0.14 / $0.28 per 1M tokens | DeepSeek API pricing ↗ |
| SWE-bench (V4 Pro, vendor-reported) | 80.6% | AIMadeTools / DeepSeek ↗ |
| MMLU-Pro (V4 Pro, Max mode, vendor-reported) | 91.2% | Artificial Analysis / DeepSeek ↗ |
Compare DeepSeek API head to head
Sources
- https://api-docs.deepseek.com/quick_start/pricing
- https://api-docs.deepseek.com/
- https://artificialanalysis.ai/models/deepseek-v4-pro
- https://artificialanalysis.ai/models/deepseek-v4-flash/providers
- https://artificialanalysis.ai/models/deepseek-v4-flash-non-reasoning/providers
- https://chat-deep.ai/docs/api-rate-limits/
- https://api7.ai/blog/analyzing-deepseek-api-instability
- https://krebsonsecurity.com/2025/02/experts-flag-security-privacy-risks-in-deepseek-ai-app/
- https://apistatuscheck.com/api/deepseek
Figures last verified 2026-06-27. Spotted an error? corrections@apibenchmarks.com
