APIbenchmarks
Turbopuffer logo

Turbopuffer

Turbopuffer · Ranked #7 of 7 in Vector Database APIs

67.7/ 100
CSolid

Object-storage-first serverless search powering Cursor and Notion, extremely cheap at scale but no free tier.

Best for

Cheap serverless search on object storage

Screenshot of Turbopuffer

Overview

Turbopuffer is a serverless vector and full-text search engine built on object storage (S3/GCS), founded by ex-Shopify engineer Simon Eskildsen. Its core architectural bet is to keep the system of record on cheap object storage (~$20/TB/month) while promoting hot data to NVMe SSD and RAM caches based on access patterns. This three-tier hierarchy lets it claim in-memory-class latency on warm queries (~10ms p90 for vector, ~18ms p90 for BM25 on 1M docs) at roughly 10x lower cost than RAM-resident incumbents like Pinecone. The tradeoff is high cold-query latency (~444ms p90) on first access and relatively high write latency (~248ms p90 for 512KB upserts), since durability is anchored to object storage. It is production-proven at notable scale, powering Cursor, Notion, Linear, Ramp, and others, and the company reports handling 4T+ documents, 10M+ writes/s, and 25k+ queries/s across its fleet.

The ideal user is a team running large-scale RAG, semantic search, or hybrid (vector + BM25 + filter) retrieval where storage volume dominates cost and where occasional cold-start latency on rarely-touched namespaces is acceptable. Multi-tenancy is a first-class concern: namespaces are cheap and isolate per-tenant data, making it a strong fit for AI products that shard search by customer. It is a poor fit for ultra-low-latency, always-hot workloads that need single-digit-ms tail latency on every request regardless of access pattern, or for teams wanting a generous free tier, there is none, and the floor is a $64/month minimum (Launch tier minimum was reduced to $16/month in a later update). Critics (notably Zilliz) argue the object-storage tradeoff hides latency and operational consequences that aren't captured by storage-cost-per-GB alone.

The community sentiment skews positive on operational simplicity, recall auto-tuning, and the engineer-grade product, with the common heuristic being "use pgvector until it breaks, then go to Turbopuffer." The main reservations are pricing for cost-sensitive users versus pgvector, the absence of a free tier, and the inherent cold-query penalty. The product remained invite/waitlist-gated for much of its early life before broader GA, which slowed evaluation for some teams.

How this score is derived

The APIbenchmarks Index is a weighted sum of four dimensions, each scored on an absolute 0–100 reference scale. See the methodology for every mapping.

DimensionScoreWeightContribution
Documentation & DXDocs are clean and engineer-oriented with quickstart, guarantees, and a transparent published benchmark/pricing-log, though some deep pages (e.g. /docs/benchmark) move or 404 and per-unit pricing is somewhat scattered.
74
30%22.2
ReliabilityProduction-proven at 4T+ documents and 10M+ writes/s for customers like Cursor and Notion, with a 99.95% uptime SLA, but only on the Enterprise tier, lower tiers carry no contractual SLA.
76
25%19.0
Ecosystem & SDKsOfficial Stainless-generated SDKs for Python, TypeScript, Go, Java and Ruby plus a clean HTTP API give broad language coverage, though it lacks the large third-party integration/connector ecosystem of older incumbents.
58
25%14.5
AccessibilitySelf-serve signup with usage-based pricing makes it approachable, but no free tier and a $64/month historical minimum (later $16 Launch floor) raise the bar versus open-source pgvector.
60
20%12.0
APIbenchmarks Index (ABI)67.7

Table 1. Derivation of the ABI for Turbopuffer. Contribution = score × weight; the index is their sum.

At a glance

Vendor
Turbopuffer
Pricing model
Per GB queried/written + storage
Free tier
No
Official SDKs
6 languages

Pricing

Launch$16/mo minimumSelf-serve usage-based plan with all core database features included; entry tier (previously a $64/month minimum).
Scale$256/mo minimumAdds HIPAA BAA, SSO, audit logs, and priority support on top of usage-based billing.
Enterprise$4,096+/mo (≈35% usage premium)Single-tenancy, BYOC, CMEK, private networking, 24/7 support and a 99.95% uptime SLA.

Key features

  • Serverless vector search on object storage (S3/GCS) with NVMe + RAM caching tiers
  • Full-text BM25 search with text boosting
  • Hybrid dense + sparse vector search with >90% recall
  • Attribute filtering against an inverted index
  • Regex search via trigram indexes
  • First-class namespaces for multi-tenant isolation
  • Branching via copy-on-write clones
  • Configurable read-after-write consistency guarantees
  • Automatic recall tuning based on data

Official SDKs

PythonTypeScript / JavaScriptGoJavaRubyHTTP REST API

Strengths & trade-offs

Strengths
  • +Object-storage-first architecture cuts storage cost to ~$20-70/TB/month versus ~$1,600/TB/month for RAM-resident incumbents
  • +Warm query latency competitive with in-memory engines (~10ms p90 vector, ~18ms p90 BM25 on 1M docs)
  • +Native hybrid search: dense + sparse vectors, BM25 full-text, attribute filtering, and regex via trigram indexes
  • +Cheap, first-class namespaces make per-tenant multi-tenancy economical at scale
  • +Automatic recall tuning and >90% recall without manual index parameter wrangling
  • +Proven in production at very large scale (Cursor, Notion, Linear, Ramp), 4T+ docs, 25k+ queries/s
Trade-offs
  • High cold-query latency (~444ms p90) when a namespace isn't cached, bad for always-must-be-fast workloads
  • Relatively high write latency (~248ms p90 for 512KB upserts) due to object-storage durability path
  • No free tier and a monthly spend minimum, making pgvector cheaper for small/early projects
  • 99.95% uptime SLA is gated to the expensive Enterprise tier only
  • Smaller third-party integration ecosystem than older incumbents like Pinecone or Elastic
  • Critics argue the storage-cost framing understates real latency/operational tradeoffs of serverless object-storage search

What developers say

Developers praise its operational simplicity, recall auto-tuning, and engineer-grade design; the main reservations are the lack of a free tier, pricing versus pgvector, and the cold-query latency tradeoff.

In terms of operational simplicity, turbopuffer just killing it [compared to Qdrant and Zilliz Cloud].

Key figures

Warm vector query latency (1M 768-dim vectors)~10ms p90Turbopuffer official benchmark/blog
Cold vector query latency (1M 768-dim vectors)~444ms p90Turbopuffer official benchmark/blog
Warm BM25 full-text query latency (1M docs)~18ms p90Turbopuffer official benchmark/blog
Write latency (512KB upserts)~248ms p90Turbopuffer docs
Storage cost (S3 + SSD cache)$70/TB/month (vs ~$1,600/TB/month incumbents)Turbopuffer official blog
Enterprise uptime SLA99.95%Turbopuffer pricing page
Production scale4T+ documents, 10M+ writes/s, 25k+ queries/sTurbopuffer docs

Compare Turbopuffer head to head

Sources

  1. https://turbopuffer.com/pricing
  2. https://turbopuffer.com/blog/turbopuffer
  3. https://turbopuffer.com/docs
  4. https://turbopuffer.com/docs/pricing-log
  5. https://github.com/turbopuffer/turbopuffer-typescript
  6. https://github.com/turbopuffer/turbopuffer-go
  7. https://news.ycombinator.com/item?id=40916786
  8. https://zilliz.com/blog/the-cost-of-consequence-what-no-one-tells-you-about-serverless-vector-databases
  9. https://www.modern-datatools.com/tools/turbopuffer

Figures last verified 2026-06-27. Spotted an error? corrections@apibenchmarks.com