Skip to content

[ICCV 2025 Highlight] LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs

Notifications You must be signed in to change notification settings

IntMeGroup/LMM4LMM

Repository files navigation

LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs (ICCV 2025 Highlight)

How to evaluate Text to Image Generation Model properly?

pipeline

T2V Model Ranks

modelrank

LMM-T2V Models

Models

LMM-VQA Models

LMMs

EvalMi-50K Download

huggingface-cli download IntMeGroup/EvalMi-50K --repo-type dataset --local-dir ./EvalMi-50K

🛠️ Installation

Clone this repository:

git clone https://github.com/IntMeGroup/LMM4LMM.git

Create a conda virtual environment and activate it:

conda create -n LMM4LMM python=3.9 -y
conda activate LMM4LMM

Install dependencies using requirements.txt:

pip install -r requirements.txt

Install flash-attn==2.3.6

pip install flash-attn==2.3.6 --no-build-isolation

Alternatively you can compile from source:

git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout v2.3.6
python setup.py install

Alternatively if you are cuda12 you can use the packed env from

huggingface-cli download IntMeGroup/env LMM4LMM.tar.gz --repo-type dataset --local-dir /home/user/anaconda3/envs
mkdir -p /home/user/anaconda3/envs/LMM4LMM
tar -xzf LMM4LMM.tar.gz -C /home/user/anaconda3/envs/LMM4LMM

🌈 Training

Preparation

huggingface-cli download IntMeGroup/EvalMi-50K/data --repo-type dataset --local-dir ./data

for stage1 training (Text-based quality levels)

sh shell/train_stage1.sh

for stage2 training (Fine-tuning the vision encoder and LLM with LoRA)

sh shell/train_stage2.sh

for quastion-answering training (QA)

sh shell/train_qa.sh

🌈 Evaluation

Download the pretrained weights

huggingface-cli download IntMeGroup/LMM4LMM-Perception --local-dir ./weights/stage2/stage2_mos1
huggingface-cli download IntMeGroup/LMM4LMM-Correspondence --local-dir ./weights/stage2/stage2_mos2
huggingface-cli download IntMeGroup/LMM4LMM-QA --local-dir ./weights/qa

for perception and correspondence score evaluation (Scores)

sh shell/eval_scores.sh

for quastion-answering evaluation (QA)

sh shell/eval_qa.sh

🌈 Inference

Download the pretrained weights

huggingface-cli download IntMeGroup/LMM4LMM-Perception --local-dir ./weights/stage2/stage2_mos1
huggingface-cli download IntMeGroup/LMM4LMM-Correspondence --local-dir ./weights/stage2/stage2_mos2

Configuration File Paths Before running the inference scripts, make sure to modify the paths in the data/infer_mos1.json and data/infer_mos2.json configuration files as follows: Make sure to update the paths accordingly:

root: Path to your root directory where img data is stored.
annotation_infer: Path to the file containing image paths for inference.
img_prompt: Path to the file containing image prompts for inference.

For Perception Score Inference:

sh shell/infer_perception.sh

For T2I Correspondence Score Inference:

sh shell/infer_correspondence.sh

📌 TODO

  • ✅ Release the training code
  • ✅ Release the evaluation code
  • ✅ Release the inference code
  • ✅ Release the EvalMi-50K Database

Quick Access of T2V Models

Model Code/Project Link
SD_v2-1 https://huggingface.co/stabilityai/stable-diffusion-2-1
i-Code-V3 https://github.com/microsoft/i-Code
SDXL_base_1 https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
DALLE3 https://openai.com/index/dall-e-3
LLMGA https://github.com/dvlab-research/LLMGA
Kandinsky-3 https://github.com/ai-forever/Kandinsky-3
LWM https://github.com/LargeWorldModel/LWM
Playground https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic
LaVi-Bridge https://github.com/ShihaoZhaoZSH/LaVi-Bridge
ELLA https://github.com/TencentQQGYLab/ELLA
Seed-xi https://github.com/AILab-CVC/SEED-X
PixArt-sigma https://github.com/PixArt-alpha/PixArt-sigma
LlamaGen https://github.com/FoundationVision/LlamaGen
Kolors https://github.com/Kwai-Kolors/Kolors
Flux_schnell https://huggingface.co/black-forest-labs/FLUX.1-schnell
Omnigen https://github.com/VectorSpaceLab/OmniGen
EMU3 https://github.com/baaivision/Emu
Vila-u https://github.com/mit-han-lab/vila-u
SD3_5_large https://huggingface.co/stabilityai/stable-diffusion-3.5-large
Show-o https://github.com/showlab/Show-o
Janus https://github.com/deepseek-ai/Janus
Hart https://github.com/mit-han-lab/hart
NOVA https://github.com/baaivision/NOVA
Infinity https://github.com/FoundationVision/Infinity

📧 Contact

If you have any inquiries, please don't hesitate to reach out via email at [email protected]

🎓Citations

If you find our work useful, please cite our paper as:

@misc{wang2025lmm4lmmbenchmarkingevaluatinglargemultimodal,
      title={LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs}, 
      author={Jiarui Wang and Huiyu Duan and Yu Zhao and Juntong Wang and Guangtao Zhai and Xiongkuo Min},
      year={2025},
      eprint={2504.08358},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.08358}, 
}

About

[ICCV 2025 Highlight] LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published