shimmy 1.7.0 #247523

BrewTestBot · 2025-10-09T05:20:51Z

Created by brew bump

Created with brew bump-formula-pr.

Details

release notes

# 🚀 Shimmy v1.7.0: The MoE Revolution is Here!

💥 BREAKTHROUGH: Run 42B+ Models on Consumer Hardware

Shimmy v1.7.0 unleashes the MoE (Mixture of Experts) CPU Offloading Revolution - enabling massive expert models to run on everyday GPUs with up to 99.9% VRAM reduction.

🔥 What's New & Game-Changing

⚡ MoE CPU Offloading Technology

Transform impossible into possible:

--cpu-moe: Automatically offload MoE layers to CPU
--n-cpu-moe N: Fine-tune performance with precise layer control
Massive Memory Savings: 15GB models → 4GB VRAM usage
Enterprise Ready: Deploy 42B parameter models on 8GB consumer cards

📊 Real Performance Gains (Validated)

GPT-OSS 20B: 71.5% VRAM reduction (15GB → 4.3GB actual measurement)
Phi-3.5-MoE 42B: Runs on consumer hardware for the first time
DeepSeek 16B: Intelligent CPU-GPU hybrid execution
Smart Tradeoffs: Accept 2-7x slower inference for 10-100x memory savings

🛠️ Technical Excellence

First-Class Rust: Enhanced llama.cpp bindings with MoE support
Cross-Platform: Windows MSVC CUDA, macOS ARM64 Metal, Linux x86_64/ARM64
Production Tested: 295/295 tests passing, comprehensive validation pipeline
Still Tiny: Sub-5MB binary maintains legendary efficiency

🎯 Use Cases Unlocked

🏢 Enterprise Deployment

Cost Revolution: Run large models without GPU farm investments
Scalable AI: Deploy expert models on existing infrastructure
Flexible Performance: Balance speed vs. memory for any workload
On-Premises Ready: Keep sensitive data in-house with minimal hardware

🔬 Research & Development

Democratized Access: Test large models on developer laptops
Rapid Iteration: Prototype MoE architectures efficiently
Educational Power: Advanced AI models accessible to everyone
Hybrid Intelligence: Combine CPU and GPU resources intelligently

🚀 Quick Start Your MoE Journey

Installation Options

# Install from crates.io (LIVE NOW!)
cargo install shimmy

# Or grab platform binaries below ⬇️

🤖 Ready-to-Use MoE Models

Curated collection on HuggingFace - optimized for CPU offloading:

🥇 Recommended Starting Points

# Download and run Phi-3.5-MoE 42B (Q4 K-M) - Best balance of quality/performance
huggingface-cli download MikeKuykendall/phi-3.5-moe-q4-k-m-cpu-offload-gguf
./shimmy serve --cpu-moe --model-path phi-3.5-moe-q4-k-m.gguf

# Or DeepSeek-MoE 16B (Q4 K-M) - Faster alternative
huggingface-cli download MikeKuykendall/deepseek-moe-16b-q4-k-m-cpu-offload-gguf
./shimmy serve --cpu-moe --model-path deepseek-moe-16b-q4-k-m.gguf

📊 Complete Model Collection

Model	Size	Quantization	VRAM	Use Case	Download
Phi-3.5-MoE	42B	Q8.0	~4GB	🏆 Maximum Quality	`phi-3.5-moe-q8-0-cpu-offload-gguf`
Phi-3.5-MoE	42B	Q4 K-M	~2.5GB	⚡ Recommended	`phi-3.5-moe-q4-k-m-cpu-offload-gguf`
Phi-3.5-MoE	42B	Q2 K	~1.5GB	🚀 Ultra Fast	`phi-3.5-moe-q2-k-cpu-offload-gguf`
DeepSeek-MoE	16B	Q8.0	~2GB	🎯 High Precision	`deepseek-moe-16b-q8-0-cpu-offload-gguf`
DeepSeek-MoE	16B	Q4 K-M	~1.2GB	⭐ Budget Pick	`deepseek-moe-16b-q4-k-m-cpu-offload-gguf`
DeepSeek-MoE	16B	Q2 K	~800MB	💨 Lightning Fast	`deepseek-moe-16b-q2-k-cpu-offload-gguf`
GPT-OSS	21B	Various	~3GB	🔬 Research/Testing	`gpt-oss-20b-moe-cpu-offload-gguf`

🎯 Model Selection Guide

🥇 First Time? → Phi-3.5-MoE Q4 K-M (best balance)
💪 High-End GPU (8GB+)? → Phi-3.5-MoE Q8.0 (maximum quality)
💻 Limited VRAM (4GB)? → DeepSeek-MoE Q4 K-M (budget friendly)
⚡ Speed Critical? → DeepSeek-MoE Q2 K (blazing fast)
🔬 Research/Validation? → GPT-OSS 21B (proven baseline)

⚡ Launch Commands

# Enable MoE CPU offloading magic
./shimmy serve --cpu-moe --port 11435 --model-path your-model.gguf

# Fine-tune performance for your hardware
./shimmy serve --n-cpu-moe 8 --port 11435 --model-path your-model.gguf

# Standard OpenAI-compatible API - zero changes to your code!
curl -X POST http://localhost:11435/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "your-model", "prompt": "Explain quantum computing in simple terms"}'

📦 Cross-Platform Binaries

Choose your platform and start the revolution:

Platform	Binary	Features
🐧 Linux x86_64	`shimmy-linux-x86_64`	SafeTensors + llama.cpp + MoE
🦾 Linux ARM64	`shimmy-linux-arm64`	Native ARM64 + full MoE support
🪟 Windows x86_64	`shimmy-windows-x86_64.exe`	CUDA GPU + MoE offloading
🍎 macOS Intel	`shimmy-macos-intel`	SafeTensors + Apple MLX
🚀 macOS Apple Silicon	`shimmy-macos-arm64`	Metal GPU + MLX + MoE power

All binaries include zero Python dependencies and native SafeTensors support.

🌟 Why This Changes Everything

Before Shimmy v1.7.0: "I need a $10,000 GPU to run expert models"

After Shimmy v1.7.0: "I'm running 42B models on my gaming laptop"

This isn't just an update - it's sustainable AI democratization. Organizations can now:

✅ Deploy cutting-edge models without infrastructure overhaul
✅ Experiment with state-of-the-art architectures on existing hardware
✅ Scale AI capabilities based on actual needs, not hardware limits
✅ Maintain complete data sovereignty with on-premises deployment

📈 Validated & Transparent

Multi-Model Testing: 3 models validated across all platforms
Real Baselines: Controlled A/B testing with actual measurements
Production Quality: Comprehensive release gate system
Open Development: Technical validation report available

🤝 Join the Revolution

🚀 Start Now: cargo install shimmy
📚 Learn More: Technical Documentation
🐛 Report Issues: GitHub Issues
🔗 Upstream: Supporting llama-cpp-rs PR #839

Ready to revolutionize your AI deployment? The future of efficient model serving is here. Download Shimmy v1.7.0 and experience the MoE revolution! 🚀

View the full release notes at https://github.com/Michael-A-Kuykendall/shimmy/releases/tag/v1.7.0.

github-actions · 2025-10-09T09:51:14Z

🤖 An automated task has requested bottles to be published to this PR.

Caution

Please do not push to this PR branch before the bottle commits have been pushed, as this results in a state that is difficult to recover from. If you need to resolve a merge conflict, please use a merge commit. Do not force-push to this PR branch.

github-actions bot added rust Rust use is a significant feature of the PR or issue bump-formula-pr PR was created using `brew bump-formula-pr` labels Oct 9, 2025

botantony approved these changes Oct 9, 2025

View reviewed changes

shimmy 1.7.0

ba8c74e

botantony force-pushed the bump-shimmy-1.7.0 branch from 02b1a6d to ba8c74e Compare October 9, 2025 09:17

shimmy: update 1.7.0 bottle.

871c89e

github-actions bot added the CI-published-bottle-commits The commits for the built bottles have been pushed to the PR branch. label Oct 9, 2025

github-actions bot approved these changes Oct 9, 2025

View reviewed changes

BrewTestBot enabled auto-merge October 9, 2025 09:52

BrewTestBot added this pull request to the merge queue Oct 9, 2025

Merged via the queue into main with commit 5f0cebc Oct 9, 2025
22 checks passed

BrewTestBot deleted the bump-shimmy-1.7.0 branch October 9, 2025 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

shimmy 1.7.0 #247523

shimmy 1.7.0 #247523

Uh oh!

BrewTestBot commented Oct 9, 2025

Uh oh!

github-actions bot commented Oct 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

shimmy 1.7.0 #247523

shimmy 1.7.0 #247523

Uh oh!

Conversation

BrewTestBot commented Oct 9, 2025

💥 BREAKTHROUGH: Run 42B+ Models on Consumer Hardware

🔥 What's New & Game-Changing

⚡ MoE CPU Offloading Technology

📊 Real Performance Gains (Validated)

🛠️ Technical Excellence

🎯 Use Cases Unlocked

🏢 Enterprise Deployment

🔬 Research & Development

🚀 Quick Start Your MoE Journey

Installation Options

🤖 Ready-to-Use MoE Models

🥇 Recommended Starting Points

📊 Complete Model Collection

🎯 Model Selection Guide

⚡ Launch Commands

📦 Cross-Platform Binaries

🌟 Why This Changes Everything

📈 Validated & Transparent

🤝 Join the Revolution

Uh oh!

github-actions bot commented Oct 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants