Home Models Pricing Docs
Sign In
Docs Get Started Rate Limits

Fair use. Scale when you need.

Rate limits keep the platform stable for everyone. Free tier starts generous — upgrade to Pro or Enterprise when your traffic grows. All limits are soft: we return headers so you can self-regulate.

3 min read v2.4.1 Updated May 14, 2026 Level Beginner

Rate limit tiers

Limits are enforced per API key and apply to all models uniformly. Token limits count both input and output tokens.

Plan
Requests / min
Tokens / min
Tokens / day
Free
20
100K
200K
Pro
1,000
10M
Unlimited
Enterprise
Custom
Custom
Unlimited
Streaming requests count differently A single streaming connection counts as one request regardless of how many chunks are delivered. This means streaming is often more rate-limit-friendly than polling.

Rate limit headers

Every response includes headers that tell you your current limit status:

Header
Example
Description
X-RateLimit-Limit
1000
Maximum requests allowed per minute for your plan.
X-RateLimit-Remaining
847
Requests remaining in the current window.
X-RateLimit-Reset
1715689200
Unix timestamp when the current rate-limit window resets.
Retry-After
3
Seconds to wait before retrying (only present on 429 responses).

Retry strategy

When you hit a 429, wait at least the number of seconds specified in Retry-After before trying again. For robust integrations, use exponential backoff with jitter:

import time, random

def backoff_retry(fn, max_retries=5):
    for attempt in range(max_retries):
        try:
            return fn()
        except bentoo.RateLimitError as e:
            wait = (2 ** attempt) + random.uniform(0, 1)
            time.sleep(wait)
    raise

Burst handling

Bentoo AI uses a sliding window algorithm with a small burst bucket. You can briefly exceed your average RPM by up to 20% before hard throttling kicks in. This smooths out traffic spikes without dropping legitimate requests.

Do not ignore 429s Repeatedly retrying without backoff will get your IP temporarily blocked. Always honor Retry-After and implement exponential backoff.

Upgrading your limits

Hitting limits regularly? Upgrade in the dashboard:

Limits take effect immediately after payment confirmation — no restart or key rotation required.