Skip to content

Releases: vllm-project/vllm-spyre

v0.9.4

11 Sep 23:48
0a098f4
Compare
Choose a tag to compare

This release:

  • Fixes a bug where the incorrect attention algorithm was used for static batching with fp8 quantized models
  • Fixes a bug where invalid --num-gpu-blocks-override values could crash the server
  • Supports specific model revisions in the unit test suite

What's Changed

Full Changelog: v0.9.3...v0.9.4

v0.9.3

10 Sep 23:01
2dcb70a
Compare
Choose a tag to compare

This release fixes a bug where a unit test is failing on spyre hardware due to a misconfiguration

What's Changed

Full Changelog: v0.9.2...v0.9.3

v0.9.2

08 Sep 21:46
f695705
Compare
Choose a tag to compare

This release:

  • Updates tests to check token probabilities against transformers (instead of logprobs) for better human interpretability
  • Adds the VLLM_SPYRE_GLOO_TIMEOUT_MINUTES config to workaround long compilation timeouts with staggered compilation

What's Changed

Full Changelog: v0.9.1...v0.9.2

v0.9.1

05 Sep 21:18
1faf004
Compare
Choose a tag to compare

This release:

  • Updates the default vllm install to 0.10.1.1
  • Fixes a bug where vlm 0.10.1.1 did not work
  • Fixes a bug where FP8 did not work with continuous batching

What's Changed

Full Changelog: v0.9.0...v0.9.1

v0.9.0

04 Sep 23:09
79577c0
Compare
Choose a tag to compare

This release

  • Adds suport for reranker models
  • Adds support for vllm 0.10.1
  • Adds extra debug options for tensor parallel operation
  • Fixes a bug where VLLM_SPYRE_MAX_LOAD_PROCESSES did not work properly

What's Changed

Full Changelog: v0.8.0...v0.9.0

v0.8.0

21 Aug 21:43
1a58185
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.7.3...v0.8.0

v0.7.3

19 Aug 16:16
470a049
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.7.2...v0.7.3

v0.7.2

18 Aug 22:00
ff48ca8
Compare
Choose a tag to compare

Mostly testing changes, but we need to be able to skip unsupported compiler tests

What's Changed

Full Changelog: v0.7.1...v0.7.2

v0.7.1

15 Aug 18:11
fa5c966
Compare
Choose a tag to compare

This release:

🐛 Fixes support for TP 4 for the full ibm-granite/granite-3.3-8b-instruct-cb model
🎉 Allow sequences to join a batch anytime

What's Changed

New Contributors

Full Changelog: v0.7.0...v0.7.1

v0.7.0

12 Aug 18:42
5902f0c
Compare
Choose a tag to compare

This release:

  • 🎉 Supports FP8 quantized models on cpu!
  • 🚧 Adds scheduler constraints and config for future long-context support with continuous batching
  • 📌 Sets an upper bound on the vllm dependency so that users no longer install future versions of vllm which were untested at the time of release

What's Changed

Full Changelog: v0.6.0...v0.7.0