Use frontier open LLMs like Qwen3 Coder, Kimi K2, DeepSeek V3.2, GLM 4.6 and more in VS Code with GitHub Copilot Chat powered by any OpenAI-compatible provider 🔥
- Supports almost all OpenAI-compatible providers, such as ModelScope, SiliconFlow, DeepSeek...
- Supports vision models.
- Offers additional configuration options for chat requests.
- Supports control model thinking and reasoning content show in chat interface.
- Supports configuring models from multiple providers simultaneously, automatically managing API keys without switch them repeatedly.
- Supports defining multiple configurations for the same model ID with different settings (e.g. thinking enable/disable for GLM-4.6).
- Supports auto retry mechanism for handling api errors like [429, 500, 502, 503, 504].
- Supports token usage count and set provider api keys in status bar.
- Supports provider and model visual configuration ui.
- VS Code 1.104.0 or higher.
- OpenAI-compatible provider API key.
- Install the OAI Compatible Provider for Copilot extension here.
- Open VS Code Settings and configure
oaicopilot.baseUrlandoaicopilot.models. - Open Github Copilot Chat interface.
- Click the model picker and select "Manage Models...".
- Choose "OAI Compatible" provider.
- Enter your API key — it will be saved locally.
- Select the models you want to add to the model picker.
"oaicopilot.baseUrl": "https://api-inference.modelscope.cn/v1",
"oaicopilot.models": [
{
"id": "Qwen/Qwen3-Coder-480B-A35B-Instruct",
"owned_by": "modelscope",
"context_length": 256000,
"max_tokens": 8192,
"temperature": 0,
"top_p": 1
}
]The extension provides a visual configuration interface that makes it easy to manage global settings, providers, and models without editing JSON files manually.
There are two ways to open the configuration interface:
-
From the Command Palette:
- Press
Ctrl+Shift+P(orCmd+Shift+Pon macOS) - Search for "OAICopilot: Open Configuration UI"
- Select the command to open the configuration panel
- Press
-
From the Status Bar:
- Click on the "OAICopilot" status bar item in the bottom-right corner of VS Code
-
Add a Provider:
- Click "Add Provider" in the Provider Management section
- Enter Provider ID: "modelscope"
- Enter Base URL: "https://api-inference.modelscope.cn/v1"
- Enter API Key: Your ModelScope API key
- Select API Mode: "openai"
- Click "Save"
-
Add a Model:
- Click "Add Model" in the Model Management section
- Select Provider: "modelscope"
- Enter Model ID: "Qwen/Qwen3-Coder-480B-A35B-Instruct"
- Configure basic parameters (context length, max tokens, etc.)
- Click "Save Model"
-
Use the Model in VS Code:
- Open GitHub Copilot Chat (
Ctrl+Shift+IorCmd+Shift+I) - Click the model picker in the chat input
- Select "Manage Models..."
- Choose "OAI Compatible" provider
- Select your configured models
- Start chatting with the model!
- Open GitHub Copilot Chat (
- Important: If use configuration ui, then the global baseURL, APIKey is invalid.
- Provider IDs: Use descriptive names that match the service (e.g., "modelscope", "iflow", "anthropic")
- Model IDs: Use the exact model identifier from the provider's documentation
- Config IDs: Use meaningful names like "thinking", "no-thinking", "fast", "accurate" for multiple configurations
- Base URL Overrides: Set model-specific base URLs when using models from different endpoints of the same provider
- Save Frequently: Changes are saved to VS Code settings immediately
- Refresh: Use the "Refresh" buttons to reload current configuration from VS Code settings
The extension supports three different API protocols to work with various model providers. You can specify which API mode to use for each model via the apiMode parameter.
-
openai(default) - OpenAI-compatible API- Endpoint:
/chat/completions - Header:
Authorization: Bearer <apiKey> - Use for: Most OpenAI-compatible providers (ModelScope, SiliconFlow, etc.)
- Endpoint:
-
ollama- Ollama native API- Endpoint:
/api/chat - Header:
Authorization: Bearer <apiKey>(or no header for local Ollama) - Use for: Local Ollama instances
- Endpoint:
-
anthropic- Anthropic Claude API- Endpoint:
/v1/messages - Header:
x-api-key: <apiKey> - Use for: Anthropic Claude models
- Endpoint:
Mixed configuration with multiple API modes:
"oaicopilot.models": [
{
"id": "GLM-4.6",
"owned_by": "modelscope",
},
{
"id": "llama3.2",
"owned_by": "ollama",
"baseUrl": "http://localhost:11434",
"apiMode": "ollama"
},
{
"id": "claude-3-5-sonnet-20241022",
"owned_by": "anthropic",
"baseUrl": "https://api.anthropic.com",
"apiMode": "anthropic"
}
]- The
apiModeparameter defaults to"openai"if not specified. - When using
ollamamode, you can omit the API key (ollamaby default) or set it to any string. - Each API mode uses different message conversion logic internally to match provider-specific formats (tools, images, thinking).
owned_byin model config is used for group apiKey. The storage key isoaicopilot.apiKey.${owned_by}.
- Open VS Code Settings and configure
oaicopilot.models. - Open command center ( Ctrl + Shift + P ), and search "OAICopilot: Set OAI Compatible Multi-Provider Apikey" to configure provider-specific API keys.
- Open Github Copilot Chat interface.
- Click the model picker and select "Manage Models...".
- Choose "OAI Compatible" provider.
- Select the models you want to add to the model picker.
"oaicopilot.baseUrl": "https://api-inference.modelscope.cn/v1",
"oaicopilot.models": [
{
"id": "Qwen/Qwen3-Coder-480B-A35B-Instruct",
"owned_by": "modelscope",
"context_length": 256000,
"max_tokens": 8192,
"temperature": 0,
"top_p": 1
},
{
"id": "qwen3-coder",
"owned_by": "iflow",
"baseUrl": "https://apis.iflow.cn/v1",
"context_length": 256000,
"max_tokens": 8192,
"temperature": 0,
"top_p": 1
}
]You can define multiple configurations for the same model ID by using the configId field. This allows you to have the same base model with different settings for different use cases.
To use this feature:
- Add the
configIdfield to your model configuration - Each configuration with the same
idmust have a uniqueconfigId - The model will appear as separate entries in the VS Code model picker
"oaicopilot.models": [
{
"id": "glm-4.6",
"configId": "thinking",
"owned_by": "zai",
"temperature": 0.7,
"top_p": 1,
"thinking": {
"type": "enabled"
}
},
{
"id": "glm-4.6",
"configId": "no-thinking",
"owned_by": "zai",
"temperature": 0,
"top_p": 1,
"thinking": {
"type": "disabled"
}
}
]In this example, you'll have three different configurations of the glm-4.6 model available in VS Code:
glm-4.6::thinking- use GLM-4.6 with thinkingglm-4.6::no-thinking- use GLM-4.6 without thinking
You can specify custom HTTP headers that will be sent with every request to a specific model's provider. This is useful for:
- API versioning headers
- Custom authentication headers (in addition to the standard Authorization header)
- Provider-specific headers required by certain APIs
- Request tracking or debugging headers
"oaicopilot.models": [
{
"id": "custom-model",
"owned_by": "provider",
"baseUrl": "https://api.example.com/v1",
"headers": {
"X-API-Version": "2024-01",
"X-Request-Source": "vscode-copilot",
"Custom-Auth-Token": "additional-token-if-needed"
}
}
]Important Notes:
- Custom headers are merged with default headers (Authorization, Content-Type, User-Agent)
- If a custom header conflicts with a default header, the custom header takes precedence
- Headers are applied on a per-model basis, allowing different headers for different providers
- Header values must be strings
The extra field allows you to add arbitrary parameters to the API request body. This is useful for provider-specific features that aren't covered by the standard parameters.
- Parameters in
extraare merged directly into the request body - Works with all API modes (
openai,ollama,anthropic) - Values can be any valid JSON type (string, number, boolean, object, array)
- OpenAI-specific parameters:
seed,logprobs,top_logprobs,suffix,presence_penalty(if not using standard parameter) - Provider-specific features: Custom sampling methods, debugging flags
- Experimental parameters: Beta features from API providers
"oaicopilot.models": [
{
"id": "custom-model",
"owned_by": "openai",
"extra": {
"seed": 42,
"logprobs": true,
"top_logprobs": 5,
"suffix": "###",
"presence_penalty": 0.1
}
},
{
"id": "local-model",
"owned_by": "ollama",
"baseUrl": "http://localhost:11434",
"apiMode": "ollama",
"extra": {
"keep_alive": "5m",
"raw": true
}
},
{
"id": "claude-model",
"owned_by": "anthropic",
"baseUrl": "https://api.anthropic.com",
"apiMode": "anthropic",
"extra": {
"service_tier": "standard_only"
}
}
]- Parameters in
extraare added after standard parameters - If an
extraparameter conflicts with a standard parameter, theextravalue takes precedence - Use this for provider-specific features only
- Standard parameters (temperature, top_p, etc.) should use their dedicated fields when possible
- API provider must support the parameters you specify
All parameters support individual configuration for different models, providing highly flexible model tuning capabilities.
id(required): Model identifierowned_by(required): Model providerdisplayName: Display name for the model that will be shown in the Copilot interface.configId: Configuration ID for this model. Allows defining the same model with different settings (e.g. 'glm-4.6::thinking', 'glm-4.6::no-thinking')family: Model family (e.g., 'gpt-4', 'claude-3', 'gemini'). Enables model-specific optimizations and behaviors. Defaults to 'oai-compatible' if not specified.baseUrl: Model-specific base URL. If not provided, the globaloaicopilot.baseUrlwill be usedcontext_length: The context length supported by the model. Default value is 128000max_tokens: Maximum number of tokens to generate (range: [1, context_length]). Default value is 4096max_completion_tokens: Maximum number of tokens to generate (OpenAI new standard parameter)vision: Whether the model supports vision capabilities. Defaults to falsetemperature: Sampling temperature (range: [0, 2]). Lower values make the output more deterministic, higher values more creative. Default value is 0top_p: Top-p sampling value (range: (0, 1]). Optional parametertop_k: Top-k sampling value (range: [1, ∞)). Optional parametermin_p: Minimum probability threshold (range: [0, 1]). Optional parameterfrequency_penalty: Frequency penalty (range: [-2, 2]). Optional parameterpresence_penalty: Presence penalty (range: [-2, 2]). Optional parameterrepetition_penalty: Repetition penalty (range: (0, 2]). Optional parameterenable_thinking: Enable model thinking and reasoning content display (for non-OpenRouter providers)thinking_budget: Maximum token count for thinking chain output. Optional parameterreasoning: OpenRouter reasoning configuration, includes the following options:enabled: Enable reasoning functionality (if not specified, will be inferred from effort or max_tokens)effort: Reasoning effort level (high, medium, low, minimal, auto)exclude: Exclude reasoning tokens from the final responsemax_tokens: Specific token limit for reasoning (Anthropic style, as an alternative to effort)
thinking: Thinking configuration for Zai providertype: Set to 'enabled' to enable thinking, 'disabled' to disable thinking
reasoning_effort: Reasoning effort level (OpenAI reasoning configuration)headers: Custom HTTP headers to be sent with every request to this model's provider (e.g.,{"X-API-Version": "v1", "X-Custom-Header": "value"}). These headers will be merged with the default headers (Authorization, Content-Type, User-Agent)extra: Extra request body parameters.include_reasoning_in_request: Whether to include reasoning_content in assistant messages sent to the API. Support deepseek-v3.2 or others.apiMode: API mode: 'openai' (Default) for API (/chat/completions), 'ollama' for API (/api/chat), 'anthropic' for API (/v1/messages).delay: Model-specific delay in milliseconds between consecutive requests. If not specified, falls back to globaloaicopilot.delayconfiguration.
Thanks to all the people who contribute.
- Open issues: https://github.com/JohnnyZ93/oai-compatible-copilot/issues
- License: MIT License Copyright (c) 2025 Johnny Zhao