Rate limits

Rate limits are applied per API key, on a sliding window, and are uniform across the REST API and MCP server (a request to either counts against the same bucket).

Default plan limits

Plan Requests per minute Requests per day Concurrent in-flight
Free 30 1,000 5
Starter 120 10,000 20
Pro 600 100,000 100
Scale Custom Custom Custom

Burst capacity is twice the per-minute limit, refilled at the steady-state rate.

How limits are signalled

Every response includes three headers:

Header Meaning
X-RateLimit-Limit Steady-state cap for this key (per minute).
X-RateLimit-Remaining Capacity left in the current minute.
X-RateLimit-Reset Unix timestamp at which the window resets.

When you hit the limit you get a 429 with a Retry-After header (in seconds). All three SDKs handle this automatically with exponential backoff.

Heavy endpoints

A few endpoints have additional concurrency limits because they invoke a model with a longer tail latency:

Excess requests queue server-side up to 30s; beyond that you get a 429 with code: "rate_limited".

Designing for throughput

If you’re orchestrating hundreds of calls — see Scoring at scale — the rule of thumb is:

  1. Keep concurrency ≤ min(plan_concurrency, 50).
  2. Honour Retry-After. Don’t poll-and-burn.
  3. Stagger startup if you’re fanning out across many workers; otherwise they all hit the limiter together.

Asking for a higher limit

Plan upgrades raise the limit immediately. For Scale-tier custom limits, write to support@brightbean.xyz with your expected QPS and traffic shape.