Conversation
- Add prerequisites section (transformers patch, model consolidation) - Expand evaluation docs for LLaVA-1.5, 1.6, Video-LLaVA, and Qwen - Add full parametrized training examples with token budget presets - Update directory tree to reflect current project structure
- Add en-core-web-sm, nltk, mpi4py, openai, huggingface_hub[hf_xet], ipdb - Remove dead commented-out visionzip setup block
- Refactor image token position tracking to handle multi-image inputs and fix prompt_len computation in llava_arch.py - Add fallback for corrupted/missing video frames in processing_video.py (repeat last valid frame or use black) - Add longest common contiguous subsequence utility in utils.py for PDrop salient token matching
- Add file existence validation before loading image/video data to skip missing or zero-byte files gracefully - Fix text-only samples missing image list initialization - Add optional training throughput statistics reporting - Reduce DeepSpeed initial_scale_power (16->10) and switch to ZeRO stage 1 to mitigate NaN loss during fine-tuning
These scripts were superseded by the unified scripts under scripts/llava/v1_5/ and scripts/llava/v1_6/. Removing to avoid confusion with the canonical script locations.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
llava_arch.pyto correctly handle multi-image inputs and fixprompt_lencomputation; add longest common contiguous subsequence utility for PDrop salient token matchingnltk,mpi4py,openai,en-core-web-sm,huggingface_hub[hf_xet]), remove dead visionzip setup block, and delete 28 deprecated pdrop scripts superseded byscripts/llava/Test plan
scripts/llava/andscripts/videollava/still work after deprecated script removalpip install -e .succeeds with the updatedsetup.pydependenciesMade with Cursor