Skip to content

XueZeyue/Awesome-Visual-Generation-Alignment-Survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 

Repository files navigation

Awesome-Visual-Generation-Alignment-Survey

The collection of awesome papers on the alignment of visual generation models (including AR and diffusion models)

We also use this repo as a reference.

We welcome community contributions!

Tutorial on Reinforcement Learning

First Two Works in Each Subfield:

Traditional Policy Gradient

  • DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models, [pdf], 2023.05
  • Training Diffusion Models with Reinforcement Learning, [pdf], 2023.05

GRPO

  • DanceGRPO: Unleashing GRPO on Visual Generation, [pdf], 2025.05
  • Flow-GRPO: Training Flow Matching Models via Online RL, [pdf], 2025.05

DPO

  • Diffusion Model Alignment Using Direct Preference Optimization, [pdf], 2023.11
  • Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model, [pdf], 2023.11

Reward Feedback Learning

  • ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation, [pdf], 2023.04
  • Directly Fine-Tuning Diffusion Models on Differentiable Rewards, [pdf], 2023.09

Alignment on AR models

  • Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step [pdf], 2025.01
  • Autoregressive Image Generation Guided by Chains of Thought [pdf], 2025.02

Reward Models

  • ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation, [pdf], 2023.04
  • Human Preference Score: Better Aligning Text-to-Image Models with Human Preference, [pdf], 2023.03

Alignment on Diffusion/Flow Models

Reinforcement Learning-based (RLHF)

  • DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models, [pdf], 2023.05
  • Training Diffusion Models with Reinforcement Learning, [pdf], 2023.05
  • Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning, [pdf], 2025.02
  • Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards, [pdf], 2025.01
  • Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation, [pdf], 2024.12
  • DanceGRPO: Unleashing GRPO on Visual Generation, [pdf], 2025.05
  • Flow-GRPO: Training Flow Matching Models via Online RL, [pdf], 2025.05
  • MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE, [pdf], 2025.07
  • TempFlow-GRPO: When Timing Matters for GRPO in Flow Models, [pdf], 2025.08
  • Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning, [pdf], 2025.08
  • DiffusionNFT: Online Diffusion Reinforcement with Forward Process, [pdf], 2025.09
  • BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models, [pdf], 2025.09
  • Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models, [pdf], 2025.09
  • PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models, [pdf], 2025.09
  • Coefficients-Preserving Sampling for Reinforcement Learning with Flow Matching, [pdf], 2025.09
  • Dynamic-TreeRPO: Breaking the Independent Trajectory Bottleneck with Structured Sampling, [pdf], 2025.09
  • MultiCrafter: High-Fidelity Multi-Subject Generation via Spatially Disentangled Attention and Identity-Aware Reinforcement Learning, [pdf], 2025.09
  • G^2RPO: Granular GRPO for Precise Reward in Flow Models, [pdf], 2025.10
  • Reinforcing Diffusion Models by Direct Group Preference Optimization, [pdf], 2025.10
  • Smart-GRPO: Smartly Sampling Noise for Efficient RL of Flow-Matching Models, [pdf], 2025.10
  • Identity-GRPO: Optimizing Multi-Human Identity-preserving Video Generation via Reinforcement Learning, [pdf], 2025.10
  • Self-Forcing++: Towards Minute-Scale High-Quality Video Generation, [pdf], 2025.10

DPO-based (referred to here)

  • Diffusion Model Alignment Using Direct Preference Optimization, [pdf], 2023.11
  • Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model, [pdf], 2023.11
  • A Dense Reward View on Aligning Text-to-Image Diffusion with Preference, [pdf], 2024.02
  • Aligning Diffusion Models by Optimizing Human Utility, [pdf], 2024.04
  • Boost Your Human Image Generation Model via Direct Preference Optimization, [pdf], 2024.05
  • Curriculum Direct Preference Optimization for Diffusion and Consistency Models, [pdf], 2024.05
  • Margin-aware Preference Optimization for Aligning Diffusion Models without Reference, [pdf], 2024.06
  • DSPO: Direct Score Preference Optimization for Diffusion Model Alignment, [pdf], 2024.09
  • Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models, [pdf], 2024.09
  • Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization, [pdf], 2024.10
  • Scalable Ranked Preference Optimization for Text-to-Image Generation, [pdf], 2024.10
  • Prioritize Denoising Steps on Diffusion Model Preference Alignment via Explicit Denoised Distribution Estimation, [pdf], 2024.11
  • PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation. arXiv 2024, [pdf], 2024.12
  • SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation, [pdf], 2024.12
  • OnlineVPO: Align Video Diffusion Model with Online Video-Centric Preference Optimization, [pdf], 2024.12
  • VideoDPO: Omni-Preference Alignment for Video Diffusion Generation, [pdf], 2024.12
  • Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization, [pdf], 2024.06
  • Personalized Preference Fine-tuning of Diffusion Models, [pdf], 2025.01
  • Calibrated Multi-Preference Optimization for Aligning Diffusion Models, [pdf], 2025.02
  • Direct Distributional Optimization for Provable Alignment of Diffusion Models, [pdf], 2025.02
  • InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment, [pdf], 2025.03
  • CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation, [pdf], 2025.02
  • Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking, [pdf], 2025.02
  • Aligning Text to Image in Diffusion Models is Easier Than You Think, [pdf], 2025.03
  • DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization, [pdf], 2025.02
  • Flow-DPO: Improving Video Generation with Human Feedback, [pdf], 2025.01
  • HuViDPO: Enhancing Video Generation through Direct Preference Optimization for Human-Centric Alignment, [pdf], 2025.02
  • Fine-Tuning Diffusion Generative Models via Rich Preference Optimization, [pdf], 2025.03
  • D-Fusion: Direct Preference Optimization for Aligning Diffusion Models with Visually Consistent Samples, [pdf], 2025.05
  • Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences, [pdf], 2025.06
  • Follow-Your-Preference: Towards Preference-Aligned Image Inpainting, [pdf], 2025.09
  • Towards Better Optimization For Listwise Preference in Diffusion Models, [pdf], 2025.10

Reward Feedback Learning

  • ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation, [pdf], 2023.04
  • Aligning Text-to-Image Diffusion Models with Reward Backpropagation, [pdf], 2023.10
  • Directly Fine-Tuning Diffusion Models on Differentiable Rewards, [pdf], 2023.09
  • Feedback Efficient Online Fine-Tuning of Diffusion Models, [pdf], 2024.02
  • Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models, [pdf], 2024.05
  • Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward, [pdf], 2024.11
  • InstructVideo: Instructing Video Diffusion Models with Human Feedback, [pdf], 2023.12
  • IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation, [pdf], 2024.10
  • Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference, [pdf], 2025.09
  • Identity-Preserving Image-to-Video Generation via Reward-Guided Optimization, [pdf], 2025.10

Technical Reports

We only list the reports using alignment methods:

  • Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model, [pdf], 2025.02
  • Seedream 2.0: A native chinese-english bilingual image generation foundation model, [pdf], 2025.03
  • Seedream 3.0 technical report, [pdf], 2025.04
  • Seedance 1.0: Exploring the Boundaries of Video Generation Models, [pdf], 2025.06
  • Seedream 4.0: Toward Next-generation Multimodal Image Generation, [pdf], 2025.09
  • Qwen-Image Technical Report, [pdf], 2025.08
  • Skywork-UniPic2, [pdf], 2025.09
  • BLIP3o-NEXT: Next Frontier of Native Image Generation, [pdf], 2025.10

Alignment on AR models (referred to here)

  • Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step, [pdf], 2025.01
  • Autoregressive Image Generation Guided by Chains of Thought, [pdf], 2025.02
  • LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization, [pdf], 2025.03
  • SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL, [pdf], 2025.04
  • UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation, [pdf], 2025.05
  • UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning, [pdf], 2025.05
  • ReasonGen-R1: CoT for Autoregressive Image Generation model through SFT and RL, [pdf], 2025.05
  • Unlocking Aha Moments via Reinforcement Learning: Advancing Collaborative Visual Comprehension and Generation, [pdf], 2025.06
  • Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO, [pdf], 2025.05
  • CoT-lized Diffusion: Let’s Reinforce T2I Generation Step-by-step, [pdf], 2025.07
  • X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again, [pdf], 2025.07
  • T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT, [pdf], 2025.05
  • AR-GRPO: Training Autoregressive Image Generation Models via Reinforcement Learning, [pdf], 2025.08
  • Group Critical-token Policy Optimization for Autoregressive Image Generation, [pdf], 2025.09
  • STAGE: Stable and Generalizable GRPO for Autoregressive Image Generation, [pdf], 2025.09
  • Layout-Conditioned Autoregressive Text-to-Image Generation via Structured Masking, [pdf], 2025.09

Benchmarks & Reward Models

Benchmarks

  • DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers, [pdf], 2022.02
  • Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark, [pdf], 2022.11
  • LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation, [pdf], 2023.05
  • VPGen & VPEval: Visual Programming for Text-to-Image Generation and Evaluation, [pdf], 2023.05
  • T2I-CompBench: An Enhanced and Comprehensive Benchmark for Compositional Text-to-Image Generation, [pdf], 2023.07
  • GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment, [pdf], 2023.10
  • Holistic Evaluation of Text-to-Image Models, [pdf], 2023.11
  • Social Reward: Evaluating and Enhancing Generative AI through Million-User Feedback from an Online Creative Community, [pdf], 2024.02
  • Rich Human Feedback for Text to Image Generation, [pdf], 2023.12
  • Learning Multi-Dimensional Human Preference for Text-to-Image Generation, [pdf], 2024.05
  • Evaluating Text-to-Visual Generation with Image-to-Text Generation, [pdf], 2024.04
  • Multimodal Large Language Models Make Text-to-Image Generative Models Align Better, [pdf], 2024.04
  • Measuring Style Similarity in Diffusion Models, [pdf], 2024.04
  • DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation, [pdf], 2024.06
  • PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment, [pdf], 2024.06
  • Video-Bench: Human-Aligned Video Generation Benchmark, [pdf], 2025.04
  • ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning, [pdf], 2025.10

Reward Models

  • Human Preference Score: Better Aligning Text-to-Image Models with Human Preference, [pdf], 2023.03
  • ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation, [pdf], 2023.04
  • Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation, [pdf], 2023.05
  • Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis, [pdf], 2023.06
  • Improving Video Generation with Human Feedback, [pdf], 2025.01
  • VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation, [pdf], 2024.06
  • VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation, [pdf], 2024.12
  • Unified Reward Model for Multimodal Understanding and Generation, [pdf], 2025.03
  • LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment, [pdf], 2024.12
  • HPSv3: Towards Wide-Spectrum Human Preference Score, [pdf], 2025.08
  • Rewarddance: Reward scaling in visual generation, [pdf], 2025.09
  • Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization, [pdf], 2025.09
  • EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling, [pdf], 2025.09
  • VideoScore2: Think before You Score in Generative Video Evaluation, [pdf], 2025.09
  • EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing, [pdf], 2025.09

Alignment with Prompt Engineering

  • Optimizing Prompts for Text-to-Image Generation, [pdf], 2022.12
  • RePrompt: Automatic Prompt Editing to Refine AI-Generative Art Towards Precise Expressions, [pdf], 2023.02
  • Improving Text-to-Image Consistency via Automatic Prompt Optimization, [pdf], 2024.03
  • Dynamic Prompt Optimizing for Text-to-Image Generation, [pdf], 2024.04
  • T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT, [pdf], 2025.05
  • RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning, [pdf], 2025.05

About

A survey for visual generation alignment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •