Documentation

Providers

Connect to 13+ LLM providers — open-source models first, plus every major proprietary API.

Edit on GitHub

Nyzhi abstracts LLM access behind a unified provider interface. Every provider supports streaming chat completions with tool use, and many support thinking or reasoning modes. Configure your preferred providers and models, then let Nyzhi handle authentication and routing.


Open-Source & Open-Weight Providers

These providers serve open-weight models — you get full transparency into the model architecture and weights.

Kimi (Moonshot)

API style: openai
Base URL: https://api.moonshot.ai/v1
Auth: MOONSHOT_API_KEY

ModelContext WindowThinking
kimi-k2.5Kimi thinking
kimi-k2-0905-previewKimi thinking
kimi-k2-turbo-previewKimi thinking

Kimi models use a dedicated thinking format. Nyzhi detects Kimi models by name prefix and automatically adjusts the request:

[provider]
default = "kimi"

[provider.kimi]
model = "kimi-k2.5"
# api_key via MOONSHOT_API_KEY env var

Thinking is enabled with thinking: { type: "enabled" }, temperature is set to 1.0, and top_p to 0.95 — all handled automatically.

Kimi Coding Plan is also available as a separate provider using the Anthropic API style at https://api.kimi.com/coding (env var: KIMI_CODING_API_KEY).


MiniMax

API style: openai
Base URL: https://api.minimax.io/v1
Auth: MINIMAX_API_KEY

ModelTierNotes
MiniMax-M2.5HighLatest flagship
MiniMax-M2.5-highspeedMediumOptimized for throughput
MiniMax-M2.1MediumPrevious generation
[provider]
default = "minimax"

[provider.minimax]
model = "MiniMax-M2.5"

MiniMax Coding Plan uses the Anthropic API style at https://api.minimax.io/anthropic (env var: MINIMAX_CODING_API_KEY).


GLM (Z.ai / Zhipu)

API style: openai
Base URL: https://api.z.ai/api/paas/v4
Auth: ZHIPU_API_KEY

ModelTierNotes
glm-5HighLatest flagship
glm-5-codeHighCode-specialized
glm-4.7MediumGeneral purpose
glm-4.7-flashxLowFast inference
[provider]
default = "glm"

[provider.glm]
model = "glm-5-code"

GLM Coding Plan is available at https://api.z.ai/api/coding/paas/v4 (env var: ZHIPU_CODING_API_KEY).


DeepSeek

API style: openai
Base URL: https://api.deepseek.com/v1
Auth: DEEPSEEK_API_KEY

ModelTierNotes
deepseek-chat (V3.2)HighGeneral purpose
deepseek-reasoner (R1)HighChain-of-thought reasoning
[provider]
default = "deepseek"

[provider.deepseek]
model = "deepseek-chat"

Groq

API style: openai
Base URL: https://api.groq.com/openai/v1
Auth: GROQ_API_KEY

Groq provides hardware-accelerated inference for open-weight models:

ModelTierNotes
llama-3.3-70b-versatileHighLlama 3.3 70B
llama-3.1-8b-instantLowUltra-fast small model
[provider]
default = "groq"

[provider.groq]
model = "llama-3.3-70b-versatile"

Together AI

API style: openai
Base URL: https://api.together.xyz/v1
Auth: TOGETHER_API_KEY

Together hosts a wide range of open-weight models with competitive pricing:

ModelTierNotes
Llama 4 MaverickHighMeta’s latest
Qwen3 235B A22BHighAlibaba’s MoE model
DeepSeek R1HighReasoning model
[provider]
default = "together"

[provider.together]
model = "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8"

Ollama (Local)

API style: openai
Base URL: http://localhost:11434/v1
Auth: None required (local)

Run models entirely on your own hardware — no API keys, no network calls, full privacy:

ModelNotes
qwen3:32bStrong general-purpose
llama3.3:70bMeta Llama 3.3
devstral:24bMistral’s code model
[provider]
default = "ollama"

[provider.ollama]
model = "qwen3:32b"

Any model available via ollama pull works automatically.


Proprietary Providers

OpenAI

API style: openai (Bearer token auth)
Auth: OPENAI_API_KEY or OAuth device code flow

ModelContext WindowTierThinking
GPT-5.3 Codex1,048,576HighReasoningEffort
GPT-5.2 Codex1,048,576HighReasoningEffort
GPT-5.21,048,576HighReasoningEffort
o3200,000HighReasoningEffort
o4-mini200,000MediumReasoningEffort

Thinking support: reasoning_effort parameter (low/medium/high). For o3 and o4-mini, thinking is enabled via OpenAI’s reasoning effort API.

Prompt caching: Automatic (OpenAI caches internally). Cache read/creation tokens are tracked in usage.


Anthropic

API style: anthropic (x-api-key header auth)
Auth: ANTHROPIC_API_KEY or OAuth PKCE flow

ModelContext WindowTierThinking
Claude Opus 4.6200,000HighAdaptiveEffort
Claude Sonnet 4.6200,000MediumAdaptiveEffort
Claude Haiku 4.5200,000LowNone

Thinking support: Adaptive effort via thinking.budget_tokens. The system prompt and the last tool result are marked with cache_control: { type: "ephemeral" } for prompt caching.

Special handling: System messages are extracted from the message array and sent as a separate system parameter (Anthropic API requirement).


Gemini

API style: gemini
Auth: GEMINI_API_KEY or OAuth Bearer token (Google PKCE flow)

ModelContext WindowTierThinking
Gemini 3.1 Pro1,048,576HighBudgetTokens
Gemini 3 Flash1,048,576LowBudgetTokens
Gemini 3 Pro1,048,576HighBudgetTokens
Gemini 2.5 Flash1,048,576MediumBudgetTokens

Thinking support: thinkingConfig.thinkingBudget parameter for models that support extended thinking.

Dual auth mode: API key appends ?key=... to the URL. Bearer token uses the Authorization header.


Antigravity (Cloud Code)

API style: gemini
Auth: ANTIGRAVITY_API_KEY
Base URL: https://generativelanguage.googleapis.com/v1beta

Access multiple providers through a single API with a free tier:

ModelNotes
gemini-3.1-proGoogle Gemini
gemini-3-flashGoogle Gemini
claude-sonnet-4-6Anthropic Claude
claude-opus-4-6-thinkingAnthropic Claude with thinking
gpt-oss-120bOpen-source composite

Aggregators

OpenRouter

API style: openai
Auth: OPENROUTER_API_KEY

Use OpenRouter to access any model available on their platform:

[provider]
default = "openrouter"

[provider.openrouter]
model = "anthropic/claude-sonnet-4-20250514"

Custom Providers

Any OpenAI-compatible API can be added:

[provider.my-local-llm]
base_url = "http://localhost:8080/v1"
api_key = "not-needed"
api_style = "openai"
env_var = "MY_LLM_KEY"

The api_style determines which implementation is used:

api_styleImplementationAuth Header
openaiOpenAI-styleAuthorization: Bearer <key>
anthropicAnthropic-stylex-api-key: <key>
geminiGemini-styleQuery param or Bearer

Thinking and Reasoning

Different providers use different thinking mechanisms:

TypeProviderParameter
ReasoningEffortOpenAI (o3, o4-mini, GPT-5.x)reasoning_effort: "low"/"medium"/"high"
AdaptiveEffortAnthropic (Claude Opus/Sonnet)thinking.budget_tokens: N
BudgetTokensGeminithinkingConfig.thinkingBudget: N
ThinkingLevelKimithinking: { type: "enabled" }

Thinking can be configured in the agent:

[agent]
# thinking_enabled = true
# thinking_budget = 10000
# reasoning_effort = "medium"

Model Registry

The model registry aggregates models from all configured providers:

  • Models per provider — list models for a specific provider
  • All models — list every available model
  • Find by ID — lookup by exact model ID or across all providers
  • Provider list — list provider names with models

Each model carries metadata: ID, display name, provider, context window, max output tokens, pricing per million tokens, quality tier, and optional thinking support.


Stub Providers

Two providers exist as stubs for future integration:

  • Claude SDK (claude-sdk): Placeholder for Claude Agent SDK integration. Returns an error instructing to install Claude Code CLI.
  • Codex (codex): Placeholder for OpenAI Codex CLI integration. Checks for the codex binary on PATH.

Neither is usable in the current release.