Providers — Nyzhi Docs

Nyzhi abstracts LLM access behind a unified provider interface. Every provider supports streaming chat completions with tool use, and many support thinking or reasoning modes. Configure your preferred providers and models, then let Nyzhi handle authentication and routing.

Open-Source & Open-Weight Providers

These providers serve open-weight models — you get full transparency into the model architecture and weights.

Kimi (Moonshot)

API style: openai
Base URL: https://api.moonshot.ai/v1
Auth: MOONSHOT_API_KEY

Model	Context Window	Thinking
kimi-k2.5	—	Kimi thinking
kimi-k2-0905-preview	—	Kimi thinking
kimi-k2-turbo-preview	—	Kimi thinking

Kimi models use a dedicated thinking format. Nyzhi detects Kimi models by name prefix and automatically adjusts the request:

[provider]
default = "kimi"

[provider.kimi]
model = "kimi-k2.5"
# api_key via MOONSHOT_API_KEY env var

Thinking is enabled with thinking: { type: "enabled" }, temperature is set to 1.0, and top_p to 0.95 — all handled automatically.

Kimi Coding Plan is also available as a separate provider using the Anthropic API style at https://api.kimi.com/coding (env var: KIMI_CODING_API_KEY).

MiniMax

API style: openai
Base URL: https://api.minimax.io/v1
Auth: MINIMAX_API_KEY

Model	Tier	Notes
MiniMax-M2.5	High	Latest flagship
MiniMax-M2.5-highspeed	Medium	Optimized for throughput
MiniMax-M2.1	Medium	Previous generation

[provider]
default = "minimax"

[provider.minimax]
model = "MiniMax-M2.5"

MiniMax Coding Plan uses the Anthropic API style at https://api.minimax.io/anthropic (env var: MINIMAX_CODING_API_KEY).

GLM (Z.ai / Zhipu)

API style: openai
Base URL: https://api.z.ai/api/paas/v4
Auth: ZHIPU_API_KEY

Model	Tier	Notes
glm-5	High	Latest flagship
glm-5-code	High	Code-specialized
glm-4.7	Medium	General purpose
glm-4.7-flashx	Low	Fast inference

[provider]
default = "glm"

[provider.glm]
model = "glm-5-code"

GLM Coding Plan is available at https://api.z.ai/api/coding/paas/v4 (env var: ZHIPU_CODING_API_KEY).

DeepSeek

API style: openai
Base URL: https://api.deepseek.com/v1
Auth: DEEPSEEK_API_KEY

Model	Tier	Notes
deepseek-chat (V3.2)	High	General purpose
deepseek-reasoner (R1)	High	Chain-of-thought reasoning

[provider]
default = "deepseek"

[provider.deepseek]
model = "deepseek-chat"

Groq

API style: openai
Base URL: https://api.groq.com/openai/v1
Auth: GROQ_API_KEY

Groq provides hardware-accelerated inference for open-weight models:

Model	Tier	Notes
llama-3.3-70b-versatile	High	Llama 3.3 70B
llama-3.1-8b-instant	Low	Ultra-fast small model

[provider]
default = "groq"

[provider.groq]
model = "llama-3.3-70b-versatile"

Together AI

API style: openai
Base URL: https://api.together.xyz/v1
Auth: TOGETHER_API_KEY

Together hosts a wide range of open-weight models with competitive pricing:

Model	Tier	Notes
Llama 4 Maverick	High	Meta’s latest
Qwen3 235B A22B	High	Alibaba’s MoE model
DeepSeek R1	High	Reasoning model

[provider]
default = "together"

[provider.together]
model = "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8"

Ollama (Local)

API style: openai
Base URL: http://localhost:11434/v1
Auth: None required (local)

Run models entirely on your own hardware — no API keys, no network calls, full privacy:

Model	Notes
qwen3:32b	Strong general-purpose
llama3.3:70b	Meta Llama 3.3
devstral:24b	Mistral’s code model

[provider]
default = "ollama"

[provider.ollama]
model = "qwen3:32b"

Any model available via ollama pull works automatically.

Proprietary Providers

OpenAI

API style: openai (Bearer token auth)
Auth: OPENAI_API_KEY or OAuth device code flow

Model	Context Window	Tier	Thinking
GPT-5.3 Codex	1,048,576	High	ReasoningEffort
GPT-5.2 Codex	1,048,576	High	ReasoningEffort
GPT-5.2	1,048,576	High	ReasoningEffort
o3	200,000	High	ReasoningEffort
o4-mini	200,000	Medium	ReasoningEffort

Thinking support: reasoning_effort parameter (low/medium/high). For o3 and o4-mini, thinking is enabled via OpenAI’s reasoning effort API.

Prompt caching: Automatic (OpenAI caches internally). Cache read/creation tokens are tracked in usage.

Anthropic

API style: anthropic (x-api-key header auth)
Auth: ANTHROPIC_API_KEY or OAuth PKCE flow

Model	Context Window	Tier	Thinking
Claude Opus 4.6	200,000	High	AdaptiveEffort
Claude Sonnet 4.6	200,000	Medium	AdaptiveEffort
Claude Haiku 4.5	200,000	Low	None

Thinking support: Adaptive effort via thinking.budget_tokens. The system prompt and the last tool result are marked with cache_control: { type: "ephemeral" } for prompt caching.

Special handling: System messages are extracted from the message array and sent as a separate system parameter (Anthropic API requirement).

Gemini

API style: gemini
Auth: GEMINI_API_KEY or OAuth Bearer token (Google PKCE flow)

Model	Context Window	Tier	Thinking
Gemini 3.1 Pro	1,048,576	High	BudgetTokens
Gemini 3 Flash	1,048,576	Low	BudgetTokens
Gemini 3 Pro	1,048,576	High	BudgetTokens
Gemini 2.5 Flash	1,048,576	Medium	BudgetTokens

Thinking support: thinkingConfig.thinkingBudget parameter for models that support extended thinking.

Dual auth mode: API key appends ?key=... to the URL. Bearer token uses the Authorization header.

Antigravity (Cloud Code)

API style: gemini
Auth: ANTIGRAVITY_API_KEY
Base URL: https://generativelanguage.googleapis.com/v1beta

Access multiple providers through a single API with a free tier:

Model	Notes
gemini-3.1-pro	Google Gemini
gemini-3-flash	Google Gemini
claude-sonnet-4-6	Anthropic Claude
claude-opus-4-6-thinking	Anthropic Claude with thinking
gpt-oss-120b	Open-source composite

Aggregators

OpenRouter

API style: openai
Auth: OPENROUTER_API_KEY

Use OpenRouter to access any model available on their platform:

[provider]
default = "openrouter"

[provider.openrouter]
model = "anthropic/claude-sonnet-4-20250514"

Custom Providers

Any OpenAI-compatible API can be added:

[provider.my-local-llm]
base_url = "http://localhost:8080/v1"
api_key = "not-needed"
api_style = "openai"
env_var = "MY_LLM_KEY"

The api_style determines which implementation is used:

api_style	Implementation	Auth Header
`openai`	OpenAI-style	`Authorization: Bearer <key>`
`anthropic`	Anthropic-style	`x-api-key: <key>`
`gemini`	Gemini-style	Query param or Bearer

Thinking and Reasoning

Different providers use different thinking mechanisms:

Type	Provider	Parameter
`ReasoningEffort`	OpenAI (o3, o4-mini, GPT-5.x)	`reasoning_effort: "low"/"medium"/"high"`
`AdaptiveEffort`	Anthropic (Claude Opus/Sonnet)	`thinking.budget_tokens: N`
`BudgetTokens`	Gemini	`thinkingConfig.thinkingBudget: N`
`ThinkingLevel`	Kimi	`thinking: { type: "enabled" }`

Thinking can be configured in the agent:

[agent]
# thinking_enabled = true
# thinking_budget = 10000
# reasoning_effort = "medium"

Model Registry

The model registry aggregates models from all configured providers:

Models per provider — list models for a specific provider
All models — list every available model
Find by ID — lookup by exact model ID or across all providers
Provider list — list provider names with models

Each model carries metadata: ID, display name, provider, context window, max output tokens, pricing per million tokens, quality tier, and optional thinking support.

Stub Providers

Two providers exist as stubs for future integration:

Claude SDK (claude-sdk): Placeholder for Claude Agent SDK integration. Returns an error instructing to install Claude Code CLI.
Codex (codex): Placeholder for OpenAI Codex CLI integration. Checks for the codex binary on PATH.

Neither is usable in the current release.