Nyzhi abstracts LLM access behind a unified provider interface. Every provider supports streaming chat completions with tool use, and many support thinking or reasoning modes. Configure your preferred providers and models, then let Nyzhi handle authentication and routing.
Open-Source & Open-Weight Providers
These providers serve open-weight models — you get full transparency into the model architecture and weights.
Kimi (Moonshot)
API style: openai
Base URL: https://api.moonshot.ai/v1
Auth: MOONSHOT_API_KEY
| Model | Context Window | Thinking |
|---|---|---|
| kimi-k2.5 | — | Kimi thinking |
| kimi-k2-0905-preview | — | Kimi thinking |
| kimi-k2-turbo-preview | — | Kimi thinking |
Kimi models use a dedicated thinking format. Nyzhi detects Kimi models by name prefix and automatically adjusts the request:
[provider]
default = "kimi"
[provider.kimi]
model = "kimi-k2.5"
# api_key via MOONSHOT_API_KEY env var
Thinking is enabled with thinking: { type: "enabled" }, temperature is set to 1.0, and top_p to 0.95 — all handled automatically.
Kimi Coding Plan is also available as a separate provider using the Anthropic API style at https://api.kimi.com/coding (env var: KIMI_CODING_API_KEY).
MiniMax
API style: openai
Base URL: https://api.minimax.io/v1
Auth: MINIMAX_API_KEY
| Model | Tier | Notes |
|---|---|---|
| MiniMax-M2.5 | High | Latest flagship |
| MiniMax-M2.5-highspeed | Medium | Optimized for throughput |
| MiniMax-M2.1 | Medium | Previous generation |
[provider]
default = "minimax"
[provider.minimax]
model = "MiniMax-M2.5"
MiniMax Coding Plan uses the Anthropic API style at https://api.minimax.io/anthropic (env var: MINIMAX_CODING_API_KEY).
GLM (Z.ai / Zhipu)
API style: openai
Base URL: https://api.z.ai/api/paas/v4
Auth: ZHIPU_API_KEY
| Model | Tier | Notes |
|---|---|---|
| glm-5 | High | Latest flagship |
| glm-5-code | High | Code-specialized |
| glm-4.7 | Medium | General purpose |
| glm-4.7-flashx | Low | Fast inference |
[provider]
default = "glm"
[provider.glm]
model = "glm-5-code"
GLM Coding Plan is available at https://api.z.ai/api/coding/paas/v4 (env var: ZHIPU_CODING_API_KEY).
DeepSeek
API style: openai
Base URL: https://api.deepseek.com/v1
Auth: DEEPSEEK_API_KEY
| Model | Tier | Notes |
|---|---|---|
| deepseek-chat (V3.2) | High | General purpose |
| deepseek-reasoner (R1) | High | Chain-of-thought reasoning |
[provider]
default = "deepseek"
[provider.deepseek]
model = "deepseek-chat"
Groq
API style: openai
Base URL: https://api.groq.com/openai/v1
Auth: GROQ_API_KEY
Groq provides hardware-accelerated inference for open-weight models:
| Model | Tier | Notes |
|---|---|---|
| llama-3.3-70b-versatile | High | Llama 3.3 70B |
| llama-3.1-8b-instant | Low | Ultra-fast small model |
[provider]
default = "groq"
[provider.groq]
model = "llama-3.3-70b-versatile"
Together AI
API style: openai
Base URL: https://api.together.xyz/v1
Auth: TOGETHER_API_KEY
Together hosts a wide range of open-weight models with competitive pricing:
| Model | Tier | Notes |
|---|---|---|
| Llama 4 Maverick | High | Meta’s latest |
| Qwen3 235B A22B | High | Alibaba’s MoE model |
| DeepSeek R1 | High | Reasoning model |
[provider]
default = "together"
[provider.together]
model = "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8"
Ollama (Local)
API style: openai
Base URL: http://localhost:11434/v1
Auth: None required (local)
Run models entirely on your own hardware — no API keys, no network calls, full privacy:
| Model | Notes |
|---|---|
| qwen3:32b | Strong general-purpose |
| llama3.3:70b | Meta Llama 3.3 |
| devstral:24b | Mistral’s code model |
[provider]
default = "ollama"
[provider.ollama]
model = "qwen3:32b"
Any model available via ollama pull works automatically.
Proprietary Providers
OpenAI
API style: openai (Bearer token auth)
Auth: OPENAI_API_KEY or OAuth device code flow
| Model | Context Window | Tier | Thinking |
|---|---|---|---|
| GPT-5.3 Codex | 1,048,576 | High | ReasoningEffort |
| GPT-5.2 Codex | 1,048,576 | High | ReasoningEffort |
| GPT-5.2 | 1,048,576 | High | ReasoningEffort |
| o3 | 200,000 | High | ReasoningEffort |
| o4-mini | 200,000 | Medium | ReasoningEffort |
Thinking support: reasoning_effort parameter (low/medium/high). For o3 and o4-mini, thinking is enabled via OpenAI’s reasoning effort API.
Prompt caching: Automatic (OpenAI caches internally). Cache read/creation tokens are tracked in usage.
Anthropic
API style: anthropic (x-api-key header auth)
Auth: ANTHROPIC_API_KEY or OAuth PKCE flow
| Model | Context Window | Tier | Thinking |
|---|---|---|---|
| Claude Opus 4.6 | 200,000 | High | AdaptiveEffort |
| Claude Sonnet 4.6 | 200,000 | Medium | AdaptiveEffort |
| Claude Haiku 4.5 | 200,000 | Low | None |
Thinking support: Adaptive effort via thinking.budget_tokens. The system prompt and the last tool result are marked with cache_control: { type: "ephemeral" } for prompt caching.
Special handling: System messages are extracted from the message array and sent as a separate system parameter (Anthropic API requirement).
Gemini
API style: gemini
Auth: GEMINI_API_KEY or OAuth Bearer token (Google PKCE flow)
| Model | Context Window | Tier | Thinking |
|---|---|---|---|
| Gemini 3.1 Pro | 1,048,576 | High | BudgetTokens |
| Gemini 3 Flash | 1,048,576 | Low | BudgetTokens |
| Gemini 3 Pro | 1,048,576 | High | BudgetTokens |
| Gemini 2.5 Flash | 1,048,576 | Medium | BudgetTokens |
Thinking support: thinkingConfig.thinkingBudget parameter for models that support extended thinking.
Dual auth mode: API key appends ?key=... to the URL. Bearer token uses the Authorization header.
Antigravity (Cloud Code)
API style: gemini
Auth: ANTIGRAVITY_API_KEY
Base URL: https://generativelanguage.googleapis.com/v1beta
Access multiple providers through a single API with a free tier:
| Model | Notes |
|---|---|
| gemini-3.1-pro | Google Gemini |
| gemini-3-flash | Google Gemini |
| claude-sonnet-4-6 | Anthropic Claude |
| claude-opus-4-6-thinking | Anthropic Claude with thinking |
| gpt-oss-120b | Open-source composite |
Aggregators
OpenRouter
API style: openai
Auth: OPENROUTER_API_KEY
Use OpenRouter to access any model available on their platform:
[provider]
default = "openrouter"
[provider.openrouter]
model = "anthropic/claude-sonnet-4-20250514"
Custom Providers
Any OpenAI-compatible API can be added:
[provider.my-local-llm]
base_url = "http://localhost:8080/v1"
api_key = "not-needed"
api_style = "openai"
env_var = "MY_LLM_KEY"
The api_style determines which implementation is used:
| api_style | Implementation | Auth Header |
|---|---|---|
openai | OpenAI-style | Authorization: Bearer <key> |
anthropic | Anthropic-style | x-api-key: <key> |
gemini | Gemini-style | Query param or Bearer |
Thinking and Reasoning
Different providers use different thinking mechanisms:
| Type | Provider | Parameter |
|---|---|---|
ReasoningEffort | OpenAI (o3, o4-mini, GPT-5.x) | reasoning_effort: "low"/"medium"/"high" |
AdaptiveEffort | Anthropic (Claude Opus/Sonnet) | thinking.budget_tokens: N |
BudgetTokens | Gemini | thinkingConfig.thinkingBudget: N |
ThinkingLevel | Kimi | thinking: { type: "enabled" } |
Thinking can be configured in the agent:
[agent]
# thinking_enabled = true
# thinking_budget = 10000
# reasoning_effort = "medium"
Model Registry
The model registry aggregates models from all configured providers:
- Models per provider — list models for a specific provider
- All models — list every available model
- Find by ID — lookup by exact model ID or across all providers
- Provider list — list provider names with models
Each model carries metadata: ID, display name, provider, context window, max output tokens, pricing per million tokens, quality tier, and optional thinking support.
Stub Providers
Two providers exist as stubs for future integration:
- Claude SDK (
claude-sdk): Placeholder for Claude Agent SDK integration. Returns an error instructing to install Claude Code CLI. - Codex (
codex): Placeholder for OpenAI Codex CLI integration. Checks for thecodexbinary on PATH.
Neither is usable in the current release.