LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs (ICCV 2025 Highlight)

How to evaluate Text to Image Generation Model properly?

T2V Model Ranks

LMM-T2V Models

LMM-VQA Models

EvalMi-50K Download

huggingface-cli download IntMeGroup/EvalMi-50K --repo-type dataset --local-dir ./EvalMi-50K

🛠️ Installation

Clone this repository:

git clone https://github.com/IntMeGroup/LMM4LMM.git

Create a conda virtual environment and activate it:

conda create -n LMM4LMM python=3.9 -y
conda activate LMM4LMM

Install dependencies using requirements.txt:

pip install -r requirements.txt

Install flash-attn==2.3.6

pip install flash-attn==2.3.6 --no-build-isolation

Alternatively you can compile from source:

git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout v2.3.6
python setup.py install

Alternatively if you are cuda12 you can use the packed env from

huggingface-cli download IntMeGroup/env LMM4LMM.tar.gz --repo-type dataset --local-dir /home/user/anaconda3/envs
mkdir -p /home/user/anaconda3/envs/LMM4LMM
tar -xzf LMM4LMM.tar.gz -C /home/user/anaconda3/envs/LMM4LMM

🌈 Training

Preparation

huggingface-cli download IntMeGroup/EvalMi-50K/data --repo-type dataset --local-dir ./data

for stage1 training (Text-based quality levels)

sh shell/train_stage1.sh

for stage2 training (Fine-tuning the vision encoder and LLM with LoRA)

sh shell/train_stage2.sh

for quastion-answering training (QA)

sh shell/train_qa.sh

🌈 Evaluation

Download the pretrained weights

huggingface-cli download IntMeGroup/LMM4LMM-Perception --local-dir ./weights/stage2/stage2_mos1
huggingface-cli download IntMeGroup/LMM4LMM-Correspondence --local-dir ./weights/stage2/stage2_mos2
huggingface-cli download IntMeGroup/LMM4LMM-QA --local-dir ./weights/qa

for perception and correspondence score evaluation (Scores)

sh shell/eval_scores.sh

for quastion-answering evaluation (QA)

sh shell/eval_qa.sh

🌈 Inference

Download the pretrained weights

huggingface-cli download IntMeGroup/LMM4LMM-Perception --local-dir ./weights/stage2/stage2_mos1
huggingface-cli download IntMeGroup/LMM4LMM-Correspondence --local-dir ./weights/stage2/stage2_mos2

Configuration File Paths Before running the inference scripts, make sure to modify the paths in the data/infer_mos1.json and data/infer_mos2.json configuration files as follows: Make sure to update the paths accordingly:

root: Path to your root directory where img data is stored.
annotation_infer: Path to the file containing image paths for inference.
img_prompt: Path to the file containing image prompts for inference.

For Perception Score Inference:

sh shell/infer_perception.sh

For T2I Correspondence Score Inference:

sh shell/infer_correspondence.sh

📌 TODO

✅ Release the training code
✅ Release the evaluation code
✅ Release the inference code
✅ Release the EvalMi-50K Database

Quick Access of T2V Models

Model	Code/Project Link
SD_v2-1	https://huggingface.co/stabilityai/stable-diffusion-2-1
i-Code-V3	https://github.com/microsoft/i-Code
SDXL_base_1	https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
DALLE3	https://openai.com/index/dall-e-3
LLMGA	https://github.com/dvlab-research/LLMGA
Kandinsky-3	https://github.com/ai-forever/Kandinsky-3
LWM	https://github.com/LargeWorldModel/LWM
Playground	https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic
LaVi-Bridge	https://github.com/ShihaoZhaoZSH/LaVi-Bridge
ELLA	https://github.com/TencentQQGYLab/ELLA
Seed-xi	https://github.com/AILab-CVC/SEED-X
PixArt-sigma	https://github.com/PixArt-alpha/PixArt-sigma
LlamaGen	https://github.com/FoundationVision/LlamaGen
Kolors	https://github.com/Kwai-Kolors/Kolors
Flux_schnell	https://huggingface.co/black-forest-labs/FLUX.1-schnell
Omnigen	https://github.com/VectorSpaceLab/OmniGen
EMU3	https://github.com/baaivision/Emu
Vila-u	https://github.com/mit-han-lab/vila-u
SD3_5_large	https://huggingface.co/stabilityai/stable-diffusion-3.5-large
Show-o	https://github.com/showlab/Show-o
Janus	https://github.com/deepseek-ai/Janus
Hart	https://github.com/mit-han-lab/hart
NOVA	https://github.com/baaivision/NOVA
Infinity	https://github.com/FoundationVision/Infinity

📧 Contact

If you have any inquiries, please don't hesitate to reach out via email at [email protected]

🎓Citations

If you find our work useful, please cite our paper as:

@misc{wang2025lmm4lmmbenchmarkingevaluatinglargemultimodal,
      title={LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs}, 
      author={Jiarui Wang and Huiyu Duan and Yu Zhao and Juntong Wang and Guangtao Zhai and Xiongkuo Min},
      year={2025},
      eprint={2504.08358},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.08358}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
MOS&QA		MOS&QA
data		data
internvl		internvl
prompts		prompts
shell		shell
README.md		README.md
config.json		config.json
requirements.txt		requirements.txt
utils.py		utils.py
zero_stage1_config.json		zero_stage1_config.json
zero_stage2_config.json		zero_stage2_config.json
zero_stage3_config.json		zero_stage3_config.json
zero_stage3_config_100b.json		zero_stage3_config_100b.json
zero_stage3_config_34b.json		zero_stage3_config_34b.json
zero_stage3_config_70b.json		zero_stage3_config_70b.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs (ICCV 2025 Highlight)

T2V Model Ranks

LMM-T2V Models

LMM-VQA Models

EvalMi-50K Download

🛠️ Installation

🌈 Training

🌈 Evaluation

🌈 Inference

📌 TODO

Quick Access of T2V Models

📧 Contact

🎓Citations

About

Uh oh!

Releases

Packages

Uh oh!

Languages

IntMeGroup/LMM4LMM

Folders and files

Latest commit

History

Repository files navigation

LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs (ICCV 2025 Highlight)

T2V Model Ranks

LMM-T2V Models

LMM-VQA Models

EvalMi-50K Download

🛠️ Installation

🌈 Training

🌈 Evaluation

🌈 Inference

📌 TODO

Quick Access of T2V Models

📧 Contact

🎓Citations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages