|
1 | 1 | ---
|
| 2 | +- &gptoss |
| 3 | + name: "gpt-oss-20b" |
| 4 | + url: "github:mudler/LocalAI/gallery/harmony.yaml@master" |
| 5 | + license: apache-2.0 |
| 6 | + tags: |
| 7 | + - gguf |
| 8 | + - gpu |
| 9 | + - cpu |
| 10 | + - gguf |
| 11 | + - openai |
| 12 | + icon: https://raw.githubusercontent.com/openai/gpt-oss/main/docs/gpt-oss-20b.svg |
| 13 | + urls: |
| 14 | + - https://huggingface.co/openai/gpt-oss-20b |
| 15 | + - https://huggingface.co/ggml-org/gpt-oss-20b-GGUF |
| 16 | + description: | |
| 17 | + Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. |
| 18 | + |
| 19 | + We’re releasing two flavors of the open models: |
| 20 | + |
| 21 | + gpt-oss-120b — for production, general purpose, high reasoning use cases that fits into a single H100 GPU (117B parameters with 5.1B active parameters) |
| 22 | + gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters) |
| 23 | + |
| 24 | + Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise. |
| 25 | + |
| 26 | + This model card is dedicated to the smaller gpt-oss-20b model. Check out gpt-oss-120b for the larger model. |
| 27 | + |
| 28 | + Highlights |
| 29 | + |
| 30 | + Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. |
| 31 | + Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. |
| 32 | + Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. |
| 33 | + Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. |
| 34 | + Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. |
| 35 | + Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making gpt-oss-120b run on a single H100 GPU and the gpt-oss-20b model run within 16GB of memory. |
| 36 | + overrides: |
| 37 | + parameters: |
| 38 | + model: gpt-oss-20b-mxfp4.gguf |
| 39 | + files: |
| 40 | + - filename: gpt-oss-20b-mxfp4.gguf |
| 41 | + sha256: 52f57ab7d3df3ba9173827c1c6832e73375553a846f3e32b49f1ae2daad688d4 |
| 42 | + uri: huggingface://ggml-org/gpt-oss-20b-GGUF/gpt-oss-20b-mxfp4.gguf |
2 | 43 | - &afm
|
3 | 44 | name: "arcee-ai_afm-4.5b"
|
4 | 45 | url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
|
|
0 commit comments