Rate limits
Rate limits are applied per API key, on a sliding window, and are uniform across the REST API and MCP server (a request to either counts against the same bucket).
Default plan limits
| Plan | Requests per minute | Requests per day | Concurrent in-flight |
|---|---|---|---|
| Free | 30 | 1,000 | 5 |
| Starter | 120 | 10,000 | 20 |
| Pro | 600 | 100,000 | 100 |
| Scale | Custom | Custom | Custom |
Burst capacity is twice the per-minute limit, refilled at the steady-state rate.
How limits are signalled
Every response includes three headers:
| Header | Meaning |
|---|---|
X-RateLimit-Limit |
Steady-state cap for this key (per minute). |
X-RateLimit-Remaining |
Capacity left in the current minute. |
X-RateLimit-Reset |
Unix timestamp at which the window resets. |
When you hit the limit you get a 429 with a Retry-After header (in seconds). All three SDKs handle this automatically with exponential backoff.
Heavy endpoints
A few endpoints have additional concurrency limits because they invoke a model with a longer tail latency:
POST /v1/score/video-hook— max 2 concurrent per key, regardless of plan.POST /v1/benchmark/channel— max 5 concurrent per key.
Excess requests queue server-side up to 30s; beyond that you get a 429 with code: "rate_limited".
Designing for throughput
If you’re orchestrating hundreds of calls — see Scoring at scale — the rule of thumb is:
- Keep
concurrency ≤ min(plan_concurrency, 50). - Honour
Retry-After. Don’t poll-and-burn. - Stagger startup if you’re fanning out across many workers; otherwise they all hit the limiter together.
Asking for a higher limit
Plan upgrades raise the limit immediately. For Scale-tier custom limits, write to support@brightbean.xyz with your expected QPS and traffic shape.