Rate limits

Rate limits are applied per API key, on a sliding window, and are uniform across the REST API and MCP server (a request to either counts against the same bucket).

Default plan limits

Plan	Requests per minute	Requests per day	Concurrent in-flight
Free	30	1,000	5
Starter	120	10,000	20
Pro	600	100,000	100
Scale	Custom	Custom	Custom

Burst capacity is twice the per-minute limit, refilled at the steady-state rate.

How limits are signalled

Every response includes three headers:

Header	Meaning
`X-RateLimit-Limit`	Steady-state cap for this key (per minute).
`X-RateLimit-Remaining`	Capacity left in the current minute.
`X-RateLimit-Reset`	Unix timestamp at which the window resets.

When you hit the limit you get a 429 with a Retry-After header (in seconds). All three SDKs handle this automatically with exponential backoff.

Heavy endpoints

A few endpoints have additional concurrency limits because they invoke a model with a longer tail latency:

POST /v1/score/video-hook — max 2 concurrent per key, regardless of plan.
POST /v1/benchmark/channel — max 5 concurrent per key.

Excess requests queue server-side up to 30s; beyond that you get a 429 with code: "rate_limited".

Designing for throughput

If you’re orchestrating hundreds of calls — see Scoring at scale — the rule of thumb is:

Keep concurrency ≤ min(plan_concurrency, 50).
Honour Retry-After. Don’t poll-and-burn.
Stagger startup if you’re fanning out across many workers; otherwise they all hit the limiter together.

Asking for a higher limit

Plan upgrades raise the limit immediately. For Scale-tier custom limits, write to support@brightbean.xyz with your expected QPS and traffic shape.