Releases · vllm-project/vllm-spyre

11 Sep 23:48

joerunde

v0.9.4

0a098f4

v0.9.4 Latest

Latest

This release:

Fixes a bug where the incorrect attention algorithm was used for static batching with fp8 quantized models
Fixes a bug where invalid --num-gpu-blocks-override values could crash the server
Supports specific model revisions in the unit test suite

What's Changed

fix: block_size to be multiple of max_batch_size by @wallashss in #454
fix: static batching with FP8 by @wallashss in #457
⚗️ Support model revision in tests by @joerunde in #456

Full Changelog: v0.9.3...v0.9.4

Contributors

joerunde and wallashss

Assets 2

10 Sep 23:01

joerunde

v0.9.3

2dcb70a

v0.9.3

This release fixes a bug where a unit test is failing on spyre hardware due to a misconfiguration

What's Changed

🎨 make available_blocks as 18 in scheduler test by @prashantgupta24 in #453
[cb] scheduler heuristic 2: unblock long prompts by @yannicks1 in #440

Full Changelog: v0.9.2...v0.9.3

Contributors

prashantgupta24 and yannicks1

Assets 2

08 Sep 21:46

prashantgupta24

v0.9.2

f695705

v0.9.2

This release:

Updates tests to check token probabilities against transformers (instead of logprobs) for better human interpretability
Adds the VLLM_SPYRE_GLOO_TIMEOUT_MINUTES config to workaround long compilation timeouts with staggered compilation

What's Changed

CI: Show isort diff by @ckadner in #449
🔥 Remove vllm 0.9.2 support by @joerunde in #448
chore: Use Python 3.11 by default for type-check by @ckadner in #450
[cb] scheduler heuristic balancing prefill/decode prioritization by @yannicks1 in #433
♻️ Temporary fix for VLLM_SPYRE_GLOO_TIMEOUT_MINUTES by @prashantgupta24 in #452
♻️ use token probability with abs_tolerance instead of comparing logprobs by @prashantgupta24 in #447

Full Changelog: v0.9.1...v0.9.2

Contributors

joerunde, prashantgupta24, and 2 other contributors

Assets 2

05 Sep 21:18

joerunde

v0.9.1

1faf004

v0.9.1

This release:

Updates the default vllm install to 0.10.1.1
Fixes a bug where vlm 0.10.1.1 did not work
Fixes a bug where FP8 did not work with continuous batching

What's Changed

🐛 fix image by @joerunde in #442
[CB] optimization: cache volumetric constraint in scheduler by @yannicks1 in #418
feat: FP8 initial support on continuous batching by @wallashss in #402
🐛 Fix SB max-model-len override by @prashantgupta24 in #436
⬆️ bump default vllm version to 0.10.1.1 by @joerunde in #446

Full Changelog: v0.9.0...v0.9.1

Contributors

joerunde, wallashss, and 2 other contributors

Assets 2

04 Sep 23:09

joerunde

v0.9.0

79577c0

v0.9.0

This release

Adds suport for reranker models
Adds support for vllm 0.10.1
Adds extra debug options for tensor parallel operation
Fixes a bug where VLLM_SPYRE_MAX_LOAD_PROCESSES did not work properly

What's Changed

[GHA] 🐛 fix: Save HF models cache for all jobs by @yannicks1 in #400
[GHA] 🎨 refactor test yaml by @yannicks1 in #401
🔥 remove FLEX_OVERWRITE_NMB_FRAME by @prashantgupta24 in #408
[test] 🎨 fix test description string by @yannicks1 in #416
[cb][test] fix scheduler constraint and add tests for batch x tkv limit by @yannicks1 in #417
⚡ Cache LLMs during tests by @joerunde in #396
[CB][Tests] Reduce number of steps in scheduler steps tests by @sducouedic in #409
🎨 reword logs for loading model weights by @prashantgupta24 in #397
🔥 trim local envs not required anymore by @prashantgupta24 in #399
🎨 make hf_cache.json prettier by @joerunde in #422
⬆️ bump base image by @joerunde in #427
♻️ [tests] Full model testing by @prashantgupta24 in #428
🎨 add info about DT_DEEPRT_VERBOSE by @prashantgupta24 in #430
🐛 fixup compilation wrapper by @joerunde in #431
🔨 Add debug log redirection option by @joerunde in #429
[doc] 👨‍🎨 Adding drawings explaining optimizations by @yannicks1 in #426
[cb][test] add tests for volumetric constraint with prefill optimization by @yannicks1 in #425
🐛 solve undetected merge conflict with main by @yannicks1 in #432
Add reranker support by @maxdebayser in #403
🎨 print relative tolerance diff in tests by @prashantgupta24 in #438
Bump vllm to v0.10.1 and add compatibility code by @maxdebayser in #443
fix VLLM_SPYRE_MAX_LOAD_PROCESSES to int instead of bool by @jberkhahn in #444

Full Changelog: v0.8.0...v0.9.0

Contributors

maxdebayser, joerunde, and 4 other contributors

Assets 2

21 Aug 21:43

prashantgupta24

v0.8.0

1a58185

v0.8.0

What's Changed

🎨 improve log statement by @prashantgupta24 in #395
[GHA] Triggering test agains vLLM:main for ready labels added by bot by @yannicks1 in #398
[tests] load only the needed models from hf cache by @yannicks1 in #393
Add VLLMS_SPYRE_MAX_LOAD_PROCESSES to limit number of processes that … by @jberkhahn in #357
🐛 COMPILATION_MODE conditionally by @prashantgupta24 in #404

New Contributors

@jberkhahn made their first contribution in #357

Full Changelog: v0.7.3...v0.8.0

Contributors

jberkhahn, prashantgupta24, and yannicks1

Assets 2

19 Aug 16:16

prashantgupta24

v0.7.3

470a049

v0.7.3

What's Changed

🎨 make max_num_seqs 4 for online test by @prashantgupta24 in #394

Full Changelog: v0.7.2...v0.7.3

Contributors

prashantgupta24

Assets 2

18 Aug 22:00

prashantgupta24

v0.7.2

ff48ca8

v0.7.2

Mostly testing changes, but we need to be able to skip unsupported compiler tests

What's Changed

[Docs] Add q3 roadmap by @rafvasq in #382
Remove block_size from arguments on LLM constructor. by @yannicks1 in #383
[CB] fix scheduler assert prints by @yannicks1 in #387
🐛 fix a bug in tests, add DISABLE_ASSERTS by @prashantgupta24 in #375
[Tests] Limit long-context test to 16k by @rafvasq in #389
[GHA] skip tests against vLLM:main if ready label not assigned yet by @yannicks1 in #388
🔥 remove long context test with bad config by @joerunde in #391
feat: removed 32k from test_swap_decode_programs_for_cb by @wallashss in #390
✅ Compiler unsupported flag for tests by @prashantgupta24 in #392

Full Changelog: v0.7.1...v0.7.2

Contributors

joerunde, wallashss, and 3 other contributors

Assets 2

15 Aug 18:11

prashantgupta24

v0.7.1

fa5c966

v0.7.1

This release:

🐛 Fixes support for TP 4 for the full ibm-granite/granite-3.3-8b-instruct-cb model
🎉 Allow sequences to join a batch anytime

What's Changed

🎨 fix warning logs for _cast_bf16_to_f16 by @prashantgupta24 in #372
[CB] Optimization: allowing sequences to join a batch anytime by @yannicks1 in #340
Add optional option --max_tokens by @kkvtran in #377
Add compat code for changing Pooler function signature by @maxdebayser in #374
🎨 fully parametrize the online script by @yannicks1 in #378
[CB] Support batch size 1 for decode, simplify warmup by @yannicks1 in #312
feat: tests to check swapping of decode program by @wallashss in #370
[Tests] Add long context batch tests by @rafvasq in #365
Document support for POWER architecture in README by @RajalakshmiSR in #366
[Compat]: Fix renamed NewRequestData argument by @maxdebayser in #380
⚡ cache hf results in tests by @joerunde in #373
[high prio][CB] 🐛 fix warmup by @yannicks1 in #384
[CB] add warning when exceeding 32K context length by @yannicks1 in #385
🎨 fix spacing in log msg by @prashantgupta24 in #386

New Contributors

@kkvtran made their first contribution in #377
@RajalakshmiSR made their first contribution in #366

Full Changelog: v0.7.0...v0.7.1

Contributors

maxdebayser, joerunde, and 6 other contributors

Assets 2

12 Aug 18:42

joerunde

v0.7.0

5902f0c

v0.7.0

This release:

🎉 Supports FP8 quantized models on cpu!
🚧 Adds scheduler constraints and config for future long-context support with continuous batching
📌 Sets an upper bound on the vllm dependency so that users no longer install future versions of vllm which were untested at the time of release

What's Changed

♻️ fix vllm:main - replace EngineCoreRequest with Request by @prashantgupta24 in #354
[Tests][FP8]: Add fp8 test by @rafvasq in #350
feat: removed triton from dependencies by @wallashss in #353
[docs] remove pooling models from supported features by @yannicks1 in #358
[docs][CB] remove warning that no output correctness is asserted for scheduler step tests by @yannicks1 in #360
[embedding] support newest vllm main branch by @yannicks1 in #361
[CB] hard code number of spyre blocks to 2080 by @yannicks1 in #362
♻️ use fp8 model for testing SB + CB by @prashantgupta24 in #359
[cb][tests] 🐛 fix bug in test utils, please merge ASAP by @yannicks1 in #367
📌 pin vllm upper bound by @prashantgupta24 in #369
[CB] set and respect compiler constraint VLLM_DT_MAX_BATCH_TKV_LIMIT by @yannicks1 in #363

Full Changelog: v0.6.0...v0.7.0

Contributors

wallashss, prashantgupta24, and 2 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

This release:

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

Releases: vllm-project/vllm-spyre

v0.9.4

What's Changed

Contributors

Uh oh!

v0.9.3

What's Changed

Contributors

Uh oh!

v0.9.2

This release:

What's Changed

Contributors

Uh oh!

v0.9.1

What's Changed

Contributors

Uh oh!

v0.9.0

What's Changed

Contributors

Uh oh!

v0.8.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.7.3

What's Changed

Contributors

Uh oh!

v0.7.2

What's Changed

Contributors

Uh oh!

v0.7.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.7.0

What's Changed

Contributors

Uh oh!