Home Models Pricing Docs
Sign In
Docs Get Started Quickstart

Your first call in 60 seconds.

From zero to a streaming chat completion. One key, every model — drop-in OpenAI-compatible endpoint, 40+ models, The price is 10% lower than the official price. Pick a language and copy.

2 min read v2.4.1 Updated Apr 27, 2026 Level Beginner

1. Get your API key

Sign in at bentoo.ai/dashboard and click Create key. Keys start with btoo_ and are shown once — store yours in a password manager or a secret vault.

Every new account comes with $1 credit. That's roughly 25 million tokens on Qwen3.5-Flash. No credit card required.

Set your environment

Export the key as BENTOO_API_KEY so the SDKs pick it up automatically:

# Add to ~/.zshrc or ~/.bashrc
export BENTOO_API_KEY="btoo_sk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
export BENTOO_BASE_URL="https://api.bentoo.ai/v1"
# Windows PowerShell
$env:BENTOO_API_KEY = "btoo_sk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
$env:BENTOO_BASE_URL = "https://api.bentoo.ai/v1"

2. Install the SDK

Bentoo AI is fully OpenAI SDK compatible — keep your existing code, just point the base URL at us. Or use our typed first-party SDKs for stricter ergonomics.

pip install bentoo                  # first-party SDK
pip install openai                   # or use OpenAI SDK
npm install bentoo
# or
pnpm add bentoo
yarn add bentoo
go get github.com/bentoo-ai/bentoo-go@latest
cargo add bentoo

3. Make your first call

A minimal chat completion. Swap model for any of 40+ supported models — same payload shape across providers.

POST https://api.bentoo.ai/v1/chat/completions
from bentoo import Bentoo

client = Bentoo()  # reads BENTOO_API_KEY from env

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[
        {"role": "user", "content": "Write a haiku about TCP."}
    ],
    temperature=0.7,
)

print(response.choices[0].message.content)
import Bentoo from "bentoo";

const client = new Bentoo();  // reads BENTOO_API_KEY from env

const response = await client.chat.completions.create({
  model: "claude-sonnet-4-6",
  messages: [
    { role: "user", content: "Write a haiku about TCP." }
  ],
  temperature: 0.7,
});

console.log(response.choices[0].message.content);
curl https://api.bentoo.ai/v1/chat/completions \
  -H "Authorization: Bearer $BENTOO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [
      { "role": "user", "content": "Write a haiku about TCP." }
    ],
    "temperature": 0.7
  }'
package main

import (
    "context"
    "fmt"
    "github.com/bentoo-ai/bentoo-go"
)

func main() {
    client := bentoo.NewClient()
    resp, _ := client.Chat.Completions.Create(
        context.Background(),
        bentoo.ChatRequest{
            Model: "claude-sonnet-4-6",
            Messages: []bentoo.Message{
                {Role: "user", Content: "Write a haiku about TCP."},
            },
        },
    )
    fmt.Println(resp.Choices[0].Message.Content)
}
Got a response? You're live. The same code works against Qwen 3 Max, GPT-5, Gemini 2.5 Pro, DeepSeek V3, Llama 3.3 — change one string. See all 40+ models →

4. Stream tokens as they arrive

For chat UIs and long completions, set stream: true. Tokens arrive over Server-Sent Events — first token usually in <400ms.

from bentoo import Bentoo

client = Bentoo()

for chunk in client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Explain quantum computing."}],
    stream=True,
):
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
import Bentoo from "bentoo";

const stream = await client.chat.completions.create({
  model: "claude-sonnet-4-6",
  messages: [{role: "user", content: "Explain quantum computing."}],
  stream: true,
});

for await (const chunk of stream) {
  if (chunk.choices[0]?.delta?.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}
curl https://api.bentoo.ai/v1/chat/completions \
  -H "Authorization: Bearer $BENTOO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "Explain quantum computing."}],
    "stream": true
  }'
Always close the stream If the client disconnects mid-stream, the request keeps generating and gets billed. Wrap in try / finally and call stream.close(), or use a context manager.

5. Request parameters

Core fields for the chat.completions endpoint. The full spec is in the API reference.

Parameter
Type
Description
model Required
string
Model ID, e.g. gpt-5, claude-sonnet-4-6, gemini-2.5-pro, deepseek-v3. See the model registry for the full list.
messages Required
array
Conversation history as {role, content} objects. Roles: system, user, assistant, tool.
temperature Optional
number
Sampling temperature, 02. Lower = more deterministic. Default 1.
max_tokens Optional
integer
Hard cap on output tokens. Defaults to the model\
stream Optional
boolean
If true, partial deltas are sent over SSE. Default false.
tools Optional
array
A list of tools the model may call. See function calling.
response_format Optional
object
Force JSON output: { "type": "json_object" } or a JSON Schema for strict structured output.

6. Response codes

Bentoo AI follows standard HTTP semantics. Errors return a JSON body with error.code and error.message.

200 OKSuccessful completion. Body contains choices[] and usage.
401 UnauthorizedMissing or invalid API key.
402 Payment RequiredAccount out of credits — top up in the dashboard.
429 Rate LimitToo many requests. Honor the Retry-After header and back off.
503 UpstreamProvider unavailable. We auto-retry on a healthy mirror — the second try almost always succeeds.
Smart fallback is on by default If the primary provider 5xx's, Bentoo AI retries on a mirror with the same model identity. You can disable this with the header X-Bentoo-Fallback: off.