Skip to content

[ICCV 2025] StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition

License

Notifications You must be signed in to change notification settings

xinding-sys/StreamMind

Repository files navigation

If our project helps you, please give us a star ⭐ on GitHub to support us. 🙏🙏

arXiv

📰 News

  • [2025.03.18] Release training, evaluation, and serving codes of StreamMind.
overview

🛠️ Requirements and Installation

Basic Dependencies:

  • Python >= 3.10
  • Pytorch >= 2.5.1
  • CUDA Version >= 11.8
  • transformers >= 4.44.2 (for mistral tokenizer)
  • tokenizers >= 0.19.1 (for mistral tokenizer)

[Online Mode] Install required packages (better for development):

git clone https://github.com/xinding-sys/StreamMind
cd StreamMind
pip install -r requirements.txt
pip install flash-attn==2.5.8 --no-build-isolation

🚀 Main Results

Streaming Dialogue

overview
overview

Offline benchmark

overview
overview

🗝️ Training & Evaluation

Quick Start

  1. Training Data Structure:
StreamMind
├── Online_datasets
│   ├── ego4d
|   |   ├── v2 
|   |   |   ├── annotations 
|   |   |   ├── full_scale
│   ├── MatchTime
|   |   ├── SN-caption 
|   |   ├── Video
├── Offline_datasets
│   ├── videollava_pt
|   |   ├── llava_image/ 
|   |   ├── valley/      
|   |   └── valley_llavaimage.json 
│   ├── videollava_sft
|   |   ├── llava_image_tune/  
|   |   ├── videochatgpt_tune/ 
|   |   └── videochatgpt_llavaimage_tune.json 
  1. Command:
# Streammind train stage 1
bash scripts/custom/finetune_stage1.sh
# Streammind train stage 2
bash scripts/custom/finetune_stage2.sh
# Streammind evaluate
bash scripts/custom/eval/evaluate.sh

📑 Citation

If you find StreamMind useful for your research and applications, please cite using this BibTeX:

@article{ding2025streammind,
  title={StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition},
  author={Ding, Xin and Wu, Hao and Yang, Yifan and Jiang, Shiqi and Bai, Donglin and Chen, Zhibo and Cao, Ting},
  journal={arXiv preprint arXiv:2503.06220},
  year={2025}
}

👍 Acknowledgement

The codebase of StreamMind is adapted from VideoLLaMA 2, We are also grateful for the following projects our StreamMind arise from:

🔒 License

This project is released under the Apache 2.0 license as found in the LICENSE file. The service is a research preview intended for non-commercial use ONLY, subject to the model Licenses of LLaMA and Mistral, Terms of Use of the data generated by OpenAI, and Privacy Practices of ShareGPT. Please get in touch with us if you find any potential violations.

About

[ICCV 2025] StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published