APIbenchmarks
xAI Grok API logo

xAI Grok API

xAI · Ranked #5 of 7 in LLM APIs

77.9/ 100
BStrong

Frontier Grok models with large context and OpenAI/Anthropic-compatible SDKs, self-serve from a single key.

Best for

Frontier models, X-integrated

Screenshot of xAI Grok API

Overview

xAI's Grok API exposes the same Grok models that power the Grok assistant on X (Twitter) through an OpenAI- and Anthropic-SDK-compatible REST endpoint (api.x.ai). The lineup has evolved rapidly: Grok 4 launched in July 2025 at a premium $3/$15 per million input/output tokens, and by mid-2026 the flagship grok-4.3 sits at $1.25 input / $2.50 output per million (with cached input around $0.20), alongside reasoning, non-reasoning, multi-agent, and a cheaper "grok-build" coding SKU, plus separate Imagine (image/video) and Voice (TTS/STT/realtime) APIs. The standout architectural features are a 1M-token context window on the 4.x models, native tool/function calling, structured outputs, vision, and built-in Live Search / real-time access to X data, which is the genuine differentiator competitors cannot easily replicate.

On raw capability the models are competitive at the frontier: Grok 4 set state-of-the-art on ARC-AGI v2 (15.9%) and posted strong SWE-bench, GPQA (~89%) and Humanity's Last Exam results at launch, and Artificial Analysis still scores Grok 4 around 42 and grok-4.3 (high) at 38 on its Intelligence Index. The trade-off is latency and consistency rather than peak smarts: third-party measurement shows high time-to-first-token on the reasoning variants (14-18s) and moderate output speeds (~40-145 tok/s depending on model/provider), and the cheaper "Fast" variants drop well down the intelligence ranking. The pricing trajectory is aggressively downward, making 4.3 one of the better intelligence-per-dollar options at the frontier.

The main reservations are operational and reputational. The model and pricing catalog churns fast (models get renamed, retired, or moved to legacy/enterprise tiers), which is a real headache for anyone building durable integrations. Developers report confusing/unclear rate limits, occasional sluggishness, and, for the coding-agent use cases, hallucinated API endpoints and method signatures. Reliability is decent but not best-in-class (multi-region status page with a handful of short incidents monthly), and brand/safety controversies around Grok's behavior on X add reputational risk for enterprise buyers. Net: a fast-moving, increasingly price-competitive frontier API with a unique real-time-X-data edge, best suited to teams that value that data access and aggressive pricing over a stable, mature, deeply-documented platform.

How this score is derived

The APIbenchmarks Index is a weighted sum of four dimensions, each scored on an absolute 0–100 reference scale. See the methodology for every mapping.

DimensionScoreWeightContribution
Documentation & DXDocs at docs.x.ai are clean and OpenAI/Anthropic-SDK-compatible with quickstarts, but coverage of newer capabilities (Live Search, structured outputs per-model, rate limits) is uneven and lags the rapid model releases.
80
30%24.0
ReliabilityMulti-region status page (us-east-1, us-west-2, eu-west-1) shows generally operational service but roughly 4 short incidents in a recent 30-day window (avg ~27 min recovery), and no clearly published uptime SLA for self-serve.
76
25%19.0
Ecosystem & SDKsStrong reach via OpenAI/Anthropic SDK drop-in compatibility plus availability on OpenRouter, Azure, and tooling like LangChain, though the native first-party SDK story is thinner than OpenAI/Anthropic.
74
25%18.5
AccessibilitySelf-serve API keys via the xAI console with pay-as-you-go pricing and a data-sharing free-credits path, but unclear rate limits and a churning model catalog raise the friction for new integrators.
82
20%16.4
APIbenchmarks Index (ABI)77.9

Table 1. Derivation of the ABI for xAI Grok API. Contribution = score × weight; the index is their sum.

At a glance

Vendor
xAI
Pricing model
Per token
Free tier
No
Official SDKs
7 languages

Pricing

grok-4.3 (flagship)$1.25 / $2.50 per 1MInput / output per million tokens; cached input ~$0.20/1M. 1M-token context.
grok-4.20 reasoning / non-reasoning$1.25 / $2.50 per 1MReasoning and standard variants on the 4.20 series, 1M context.
grok-build-0.1 (coding agent)$1.00 / $2.00 per 1MLower-cost SKU aimed at the Grok Build terminal coding agent, 256k context.
Grok 4 (legacy / launch)$3.00 / $15.00 per 1MOriginal July 2025 flagship pricing; now legacy/enterprise as newer SKUs supersede it.
Grok Imagine API$0.02–$0.05 / image, $0.05–$0.08 / sec videoImage and video generation pricing.
Grok Voice API$0.05/min realtime; $15/1M chars TTS; ~$0.10/hr STTRealtime audio, text-to-speech, and speech-to-text endpoints.

Key features

  • 1M-token context window (Grok 4.x)
  • Live Search / real-time X (Twitter) data access
  • Function / tool calling
  • Structured outputs (JSON schema)
  • Vision / image understanding
  • Reasoning and non-reasoning model variants
  • Multi-agent model SKU
  • Grok Imagine API for image and video generation
  • Grok Voice API (realtime audio, TTS, STT)
  • OpenAI- and Anthropic-API compatibility

Official SDKs

Python (OpenAI SDK compatible)JavaScript / TypeScript (OpenAI SDK compatible)Anthropic SDK compatibleREST / cURLAvailable via OpenRouterAvailable on Microsoft Azure AI FoundryLangChain integration

Strengths & trade-offs

Strengths
  • +Built-in Live Search and real-time access to X/Twitter data that competing LLM APIs cannot match
  • +Large 1M-token context window on the Grok 4.x models
  • +OpenAI- and Anthropic-SDK compatible, so migration is mostly a base-URL/key swap
  • +Aggressively falling prices, grok-4.3 at $1.25/$2.50 is strong intelligence-per-dollar at the frontier
  • +Frontier-level benchmark results (SOTA ARC-AGI v2, strong GPQA/SWE-bench at launch)
  • +Native tool/function calling, structured outputs, vision, plus separate Imagine and Voice APIs
Trade-offs
  • Model and pricing catalog churns fast, models get renamed, retired, or moved to legacy, breaking durable integrations
  • High time-to-first-token (~14–18s) on reasoning variants and only moderate output speed
  • Developers report unclear/confusing rate limits and occasional sluggish performance
  • Coding-agent variants can hallucinate non-existent API endpoints and method signatures
  • No clearly published self-serve uptime SLA; a handful of short incidents per month
  • Brand and content-safety controversies around Grok on X add enterprise reputational risk

What developers say

Developers praise the speed, real-time X data, and concise code output, but criticize confusing rate limits, fast-churning models/pricing, and coding-agent hallucinations.

It doesn't over-explain, just gives me the code, used it to refactor a REST API quickly.

Key figures

Artificial Analysis Intelligence Index (Grok 4)~42Artificial Analysis
Artificial Analysis Intelligence Index (grok-4.3 high)38Artificial Analysis
Output speed (grok-4.3 high)139.3 tokens/secArtificial Analysis
Time to first token (grok-4.3 high)18.28 sArtificial Analysis
ARC-AGI v2 score (Grok 4)15.9% (SOTA for closed models at launch)xAI (Grok 4 announcement)
Flagship API price (grok-4.3)$1.25 in / $2.50 out per 1M tokensxAI pricing / docs
Recent reliability~4 incidents / 30 days, ~27 min avg recoveryAIWatch (xAI status tracking)

Compare xAI Grok API head to head

Sources

  1. https://x.ai/api
  2. https://docs.x.ai/docs/models
  3. https://docs.x.ai/docs/overview
  4. https://x.ai/news/grok-4
  5. https://artificialanalysis.ai/models/grok-4-3
  6. https://artificialanalysis.ai/models/grok-4/providers
  7. https://status.x.ai/
  8. https://www.arsturn.com/blog/grok-4-rate-limits-why-premium-users-are-getting-frustrated
  9. https://jingrey.com/tools/grok-build-beta-review/

Figures last verified 2026-06-27. Spotted an error? corrections@apibenchmarks.com