Add integration tests #338
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adding a script to run integration tests.
The script defines a list of notebooks for testing, and starts a SkyPilot cluster executing notebooks one by one.
No need to SSH into the cluster, just run
scripts/launch-test-cluster.sh --no-pull
and it will execute tests, print results in the console, and terminate the cluster.Currently, we test the most popular notebooks from the examples folder.
I had to use
openpipe-art==0.4.7
in all examples because it could not run notebooks programatically with the old version. To keep it compatible with Google Colab, we probably need to release a new art version that supports vLLM 0.9.2, and pin Colab examples to the old vLLM version. Can we?I was constantly running into the
RuntimeError: CUDA error: an illegal memory access was encountered
. Turns out, it happens when you start different vLLM engines one by one. For example, in one notebook, we build a vLLM engine with the default configuration, but in the new run, we provideengine_args=enforce_eager=True
. It does not fully clean the memory, and the new engine fails.Workeround: set
model._internal_config
to None during tests. Not ideal since we cannot test configs.It overrides some notebook variables (to run only one training step for faster execution) and changes the project name so logs are recorded under the “Tester” project on W&B.
Added
--no-pull
args to bothlaunch-test-cluster.sh
andlaunch-cluster.sh
to deploy the current version without reverting to the main branch.Pytest caught some silent type errors that were not visible during regular execution. Minor changes in
src/art/trajectories.py
andsrc/art/unsloth/train.py
I’m still occasionally hitting the following error during test execution:
RuntimeError: Sleep mode can only be used for one instance per process.
Not sure why it’s happening. Can we disable sleep mode for tests?