xAI Grok API
xAI · Ranked #5 of 7 in LLM APIs
Frontier Grok models with large context and OpenAI/Anthropic-compatible SDKs, self-serve from a single key.
Frontier models, X-integrated

Overview
xAI's Grok API exposes the same Grok models that power the Grok assistant on X (Twitter) through an OpenAI- and Anthropic-SDK-compatible REST endpoint (api.x.ai). The lineup has evolved rapidly: Grok 4 launched in July 2025 at a premium $3/$15 per million input/output tokens, and by mid-2026 the flagship grok-4.3 sits at $1.25 input / $2.50 output per million (with cached input around $0.20), alongside reasoning, non-reasoning, multi-agent, and a cheaper "grok-build" coding SKU, plus separate Imagine (image/video) and Voice (TTS/STT/realtime) APIs. The standout architectural features are a 1M-token context window on the 4.x models, native tool/function calling, structured outputs, vision, and built-in Live Search / real-time access to X data, which is the genuine differentiator competitors cannot easily replicate.
On raw capability the models are competitive at the frontier: Grok 4 set state-of-the-art on ARC-AGI v2 (15.9%) and posted strong SWE-bench, GPQA (~89%) and Humanity's Last Exam results at launch, and Artificial Analysis still scores Grok 4 around 42 and grok-4.3 (high) at 38 on its Intelligence Index. The trade-off is latency and consistency rather than peak smarts: third-party measurement shows high time-to-first-token on the reasoning variants (14-18s) and moderate output speeds (~40-145 tok/s depending on model/provider), and the cheaper "Fast" variants drop well down the intelligence ranking. The pricing trajectory is aggressively downward, making 4.3 one of the better intelligence-per-dollar options at the frontier.
The main reservations are operational and reputational. The model and pricing catalog churns fast (models get renamed, retired, or moved to legacy/enterprise tiers), which is a real headache for anyone building durable integrations. Developers report confusing/unclear rate limits, occasional sluggishness, and, for the coding-agent use cases, hallucinated API endpoints and method signatures. Reliability is decent but not best-in-class (multi-region status page with a handful of short incidents monthly), and brand/safety controversies around Grok's behavior on X add reputational risk for enterprise buyers. Net: a fast-moving, increasingly price-competitive frontier API with a unique real-time-X-data edge, best suited to teams that value that data access and aggressive pricing over a stable, mature, deeply-documented platform.
How this score is derived
The APIbenchmarks Index is a weighted sum of four dimensions, each scored on an absolute 0–100 reference scale. See the methodology for every mapping.
| Dimension | Score | Weight | Contribution |
|---|---|---|---|
| Documentation & DXDocs at docs.x.ai are clean and OpenAI/Anthropic-SDK-compatible with quickstarts, but coverage of newer capabilities (Live Search, structured outputs per-model, rate limits) is uneven and lags the rapid model releases. | 80 | 30% | 24.0 |
| ReliabilityMulti-region status page (us-east-1, us-west-2, eu-west-1) shows generally operational service but roughly 4 short incidents in a recent 30-day window (avg ~27 min recovery), and no clearly published uptime SLA for self-serve. | 76 | 25% | 19.0 |
| Ecosystem & SDKsStrong reach via OpenAI/Anthropic SDK drop-in compatibility plus availability on OpenRouter, Azure, and tooling like LangChain, though the native first-party SDK story is thinner than OpenAI/Anthropic. | 74 | 25% | 18.5 |
| AccessibilitySelf-serve API keys via the xAI console with pay-as-you-go pricing and a data-sharing free-credits path, but unclear rate limits and a churning model catalog raise the friction for new integrators. | 82 | 20% | 16.4 |
| APIbenchmarks Index (ABI) | 77.9 | ||
Table 1. Derivation of the ABI for xAI Grok API. Contribution = score × weight; the index is their sum.
At a glance
- Vendor
- xAI
- Pricing model
- Per token
- Free tier
- No
- Official SDKs
- 7 languages
Pricing
| grok-4.3 (flagship) | $1.25 / $2.50 per 1M | Input / output per million tokens; cached input ~$0.20/1M. 1M-token context. |
| grok-4.20 reasoning / non-reasoning | $1.25 / $2.50 per 1M | Reasoning and standard variants on the 4.20 series, 1M context. |
| grok-build-0.1 (coding agent) | $1.00 / $2.00 per 1M | Lower-cost SKU aimed at the Grok Build terminal coding agent, 256k context. |
| Grok 4 (legacy / launch) | $3.00 / $15.00 per 1M | Original July 2025 flagship pricing; now legacy/enterprise as newer SKUs supersede it. |
| Grok Imagine API | $0.02–$0.05 / image, $0.05–$0.08 / sec video | Image and video generation pricing. |
| Grok Voice API | $0.05/min realtime; $15/1M chars TTS; ~$0.10/hr STT | Realtime audio, text-to-speech, and speech-to-text endpoints. |
Key features
- •1M-token context window (Grok 4.x)
- •Live Search / real-time X (Twitter) data access
- •Function / tool calling
- •Structured outputs (JSON schema)
- •Vision / image understanding
- •Reasoning and non-reasoning model variants
- •Multi-agent model SKU
- •Grok Imagine API for image and video generation
- •Grok Voice API (realtime audio, TTS, STT)
- •OpenAI- and Anthropic-API compatibility
Official SDKs
Strengths & trade-offs
- +Built-in Live Search and real-time access to X/Twitter data that competing LLM APIs cannot match
- +Large 1M-token context window on the Grok 4.x models
- +OpenAI- and Anthropic-SDK compatible, so migration is mostly a base-URL/key swap
- +Aggressively falling prices, grok-4.3 at $1.25/$2.50 is strong intelligence-per-dollar at the frontier
- +Frontier-level benchmark results (SOTA ARC-AGI v2, strong GPQA/SWE-bench at launch)
- +Native tool/function calling, structured outputs, vision, plus separate Imagine and Voice APIs
- –Model and pricing catalog churns fast, models get renamed, retired, or moved to legacy, breaking durable integrations
- –High time-to-first-token (~14–18s) on reasoning variants and only moderate output speed
- –Developers report unclear/confusing rate limits and occasional sluggish performance
- –Coding-agent variants can hallucinate non-existent API endpoints and method signatures
- –No clearly published self-serve uptime SLA; a handful of short incidents per month
- –Brand and content-safety controversies around Grok on X add enterprise reputational risk
What developers say
Developers praise the speed, real-time X data, and concise code output, but criticize confusing rate limits, fast-churning models/pricing, and coding-agent hallucinations.
“It doesn't over-explain, just gives me the code, used it to refactor a REST API quickly.”
Key figures
| Artificial Analysis Intelligence Index (Grok 4) | ~42 | Artificial Analysis ↗ |
| Artificial Analysis Intelligence Index (grok-4.3 high) | 38 | Artificial Analysis ↗ |
| Output speed (grok-4.3 high) | 139.3 tokens/sec | Artificial Analysis ↗ |
| Time to first token (grok-4.3 high) | 18.28 s | Artificial Analysis ↗ |
| ARC-AGI v2 score (Grok 4) | 15.9% (SOTA for closed models at launch) | xAI (Grok 4 announcement) ↗ |
| Flagship API price (grok-4.3) | $1.25 in / $2.50 out per 1M tokens | xAI pricing / docs ↗ |
| Recent reliability | ~4 incidents / 30 days, ~27 min avg recovery | AIWatch (xAI status tracking) ↗ |
Compare xAI Grok API head to head
Sources
- https://x.ai/api
- https://docs.x.ai/docs/models
- https://docs.x.ai/docs/overview
- https://x.ai/news/grok-4
- https://artificialanalysis.ai/models/grok-4-3
- https://artificialanalysis.ai/models/grok-4/providers
- https://status.x.ai/
- https://www.arsturn.com/blog/grok-4-rate-limits-why-premium-users-are-getting-frustrated
- https://jingrey.com/tools/grok-build-beta-review/
Figures last verified 2026-06-27. Spotted an error? corrections@apibenchmarks.com
