You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
with 2 Mac studio m2 Ultra: 192GB and 64GB, create a gpu cluster. In resource displays two workers ready. deploy DeepSeek-R1-UD-IQ1_S.gguf(131GB) locally in one big file with the following distribution configuration:
Result
inference is very slow: 0.69 tokens/s
Expected behavior
commonly the same hardware could provide 17 tokens/s with Ollama or llama.cpp backend. GPUStack could catch up with this anyway.