Docs Get Started Quickstart

Your first call in 60 seconds.

From zero to a streaming chat completion. One key, every model — drop-in OpenAI-compatible endpoint, 30+ models, The price is 10% lower than the official price. Pick a language and copy.

2 min read v2.4.1 Updated Apr 27, 2026 Level Beginner

1. Get your API key

Sign in at bentoo.ai/dashboard and click Create key. Keys start with sk- and are shown once — store yours in a password manager or a secret vault.

Set your environment

Export the key as BENTOO_API_KEY so the SDKs pick it up automatically:

# Add to ~/.zshrc or ~/.bashrc
export BENTOO_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
export BENTOO_BASE_URL="https://api.bentoo.ai/v1"

# Windows PowerShell
$env:BENTOO_API_KEY = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
$env:BENTOO_BASE_URL = "https://api.bentoo.ai/v1"

2. Make your first call

A minimal chat completion. Swap model for any of 30+ supported models — same payload shape across providers.

POST https://api.bentoo.ai/v1/chat/completions

from openai import OpenAI

client = OpenAI(
    api_key="sk-xxxxxxxxxxxxxx",
    base_url="https://api.bentoo.ai/v1"
)

response = client.chat.completions.create(
    model="qwen/qwen3.7-plus",
    messages=[
        {"role": "user", "content": "Write a haiku about TCP."}
    ],
    temperature=0.7,
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "sk-xxxxxxxxxxxxxx",
  baseURL: "https://api.bentoo.ai/v1",
});

const response = await client.chat.completions.create({
  model: "qwen/qwen3.7-plus",
  messages: [
    { role: "user", content: "Write a haiku about TCP." }
  ],
  temperature: 0.7,
});

console.log(response.choices[0].message.content);

curl https://api.bentoo.ai/v1/chat/completions \
  -H "Authorization: Bearer $BENTOO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3.7-plus",
    "messages": [
      { "role": "user", "content": "Write a haiku about TCP." }
    ],
    "temperature": 0.7
  }'

package main

import (
    "context"
    "fmt"
    "github.com/bentoo-ai/bentoo-go"
)

func main() {
    client := bentoo.NewClient()
    resp, _ := client.Chat.Completions.Create(
        context.Background(),
        bentoo.ChatRequest{
            Model: "qwen/qwen3.7-plus",
            Messages: []bentoo.Message{
                {Role: "user", Content: "Write a haiku about TCP."},
            },
        },
    )
    fmt.Println(resp.Choices[0].Message.Content)
}

Got a response? You're live. The same code works against Qwen, GPT, Gemini, DeepSeek, Llama — change one string. See all 30+ models →

3. Stream tokens as they arrive

For chat UIs and long completions, set stream: true. Tokens arrive over Server-Sent Events — first token usually in <400ms.

from openai import OpenAI

client = OpenAI(
    api_key="sk-xxxxxxxxxxxxxx",
    base_url="https://api.bentoo.ai/v1"
)

for chunk in client.chat.completions.create(
    model="qwen/qwen3.7-plus",
    messages=[{"role": "user", "content": "Explain quantum computing."}],
    stream=True,
):
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "sk-xxxxxxxxxxxxxx",
  baseURL: "https://api.bentoo.ai/v1",
});

const stream = await client.chat.completions.create({
  model: "qwen/qwen3.7-plus",
  messages: [{role: "user", content: "Explain quantum computing."}],
  stream: true,
});

for await (const chunk of stream) {
  if (chunk.choices[0]?.delta?.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}

curl https://api.bentoo.ai/v1/chat/completions \
  -H "Authorization: Bearer $BENTOO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3.7-plus",
    "messages": [{"role": "user", "content": "Explain quantum computing."}],
    "stream": true
  }'

Always close the stream If the client disconnects mid-stream, the request keeps generating and gets billed. Wrap in try / finally and call stream.close(), or use a context manager.

4. Request parameters

Core fields for the chat.completions endpoint. The full spec is in the API reference.

Parameter

Type

Description

model Required

string

Model ID, e.g. gpt, claude-sonnet, gemini, deepseek. See the model registry for the full list.

messages Required

array

Conversation history as {role, content} objects. Roles: system, user, assistant, tool.

temperature Optional

number

Sampling temperature, 0–2. Lower = more deterministic. Default 1.

max_tokens Optional

integer

Hard cap on output tokens. Defaults to the model's maximum context length.

stream Optional

boolean

If true, partial deltas are sent over SSE. Default false.

tools Optional

array

A list of tools the model may call. See function calling.

response_format Optional

object

Force JSON output: { "type": "json_object" } or a JSON Schema for strict structured output.

5. Response codes

Bentoo AI follows standard HTTP semantics. Errors return a JSON body with error.code and error.message.

200 OKSuccessful completion. Body contains choices[] and usage.

401 UnauthorizedMissing or invalid API key.

402 Payment RequiredAccount out of credits — top up in the dashboard.

429 Rate LimitToo many requests. Honor the Retry-After header and back off.

503 UpstreamProvider unavailable. We auto-retry on a healthy mirror — the second try almost always succeeds.

Smart fallback is on by default If the primary provider returns a 5xx error or times out, bentoo retries on a mirror with the same model identity. You can disable this with the header X-Bentoo-Fallback: off.

Previous Introduction Next Authentication

Welcome back

Reset password

Password reset!

Create your account

Account created!

Terms of Service

Your first call in 60 seconds.

1. Get your API key

Set your environment

2. Make your first call

3. Stream tokens as they arrive

4. Request parameters

5. Response codes