GitHub - xinding-sys/StreamMind: [ICCV 2025] StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition

StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition

If our project helps you, please give us a star ⭐ on GitHub to support us. 🙏🙏

📰 News

[2025.03.18] Release training, evaluation, and serving codes of StreamMind.

▶️ Click here to watch the demo video

🛠️ Requirements and Installation

Basic Dependencies:

Python >= 3.10
Pytorch >= 2.5.1
CUDA Version >= 11.8
transformers >= 4.44.2 (for mistral tokenizer)
tokenizers >= 0.19.1 (for mistral tokenizer)

[Online Mode] Install required packages (better for development):

git clone https://github.com/xinding-sys/StreamMind
cd StreamMind
pip install -r requirements.txt
pip install flash-attn==2.5.8 --no-build-isolation

🚀 Main Results

Streaming Dialogue

Offline benchmark

🗝️ Training & Evaluation

Quick Start

Training Data Structure:

StreamMind
├── Online_datasets
│   ├── ego4d
|   |   ├── v2 
|   |   |   ├── annotations 
|   |   |   ├── full_scale
│   ├── MatchTime
|   |   ├── SN-caption 
|   |   ├── Video
├── Offline_datasets
│   ├── videollava_pt
|   |   ├── llava_image/ 
|   |   ├── valley/      
|   |   └── valley_llavaimage.json 
│   ├── videollava_sft
|   |   ├── llava_image_tune/  
|   |   ├── videochatgpt_tune/ 
|   |   └── videochatgpt_llavaimage_tune.json

Command:

# Streammind train stage 1
bash scripts/custom/finetune_stage1.sh
# Streammind train stage 2
bash scripts/custom/finetune_stage2.sh
# Streammind evaluate
bash scripts/custom/eval/evaluate.sh

📑 Citation

If you find StreamMind useful for your research and applications, please cite using this BibTeX:

@article{ding2025streammind,
  title={StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition},
  author={Ding, Xin and Wu, Hao and Yang, Yifan and Jiang, Shiqi and Bai, Donglin and Chen, Zhibo and Cao, Ting},
  journal={arXiv preprint arXiv:2503.06220},
  year={2025}
}

👍 Acknowledgement

The codebase of StreamMind is adapted from VideoLLaMA 2, We are also grateful for the following projects our StreamMind arise from:

🔒 License

This project is released under the Apache 2.0 license as found in the LICENSE file. The service is a research preview intended for non-commercial use ONLY, subject to the model Licenses of LLaMA and Mistral, Terms of Use of the data generated by OpenAI, and Privacy Practices of ShareGPT. Please get in touch with us if you find any potential violations.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
assets		assets
data		data
scripts		scripts
streammind		streammind
.amltconfig		.amltconfig
.gitignore		.gitignore
AWSCLIV2.pkg		AWSCLIV2.pkg
LICENSE		LICENSE
README.md		README.md
process_clip_encoder.py		process_clip_encoder.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition

If our project helps you, please give us a star ⭐ on GitHub to support us. 🙏🙏

📰 News

🛠️ Requirements and Installation

🚀 Main Results

Streaming Dialogue

Offline benchmark

🗝️ Training & Evaluation

Quick Start

📑 Citation

👍 Acknowledgement

🔒 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

xinding-sys/StreamMind

Folders and files

Latest commit

History

Repository files navigation

StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition

If our project helps you, please give us a star ⭐ on GitHub to support us. 🙏🙏

📰 News

🛠️ Requirements and Installation

🚀 Main Results

Streaming Dialogue

Offline benchmark

🗝️ Training & Evaluation

Quick Start

📑 Citation

👍 Acknowledgement

🔒 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages