Unofficial Implementation of VistaDream: Sampling multiview consistent images for single-view scene reconstruction

VistaDream is a novel framework for reconstructing 3D scenes from single-view images using Flux-based diffusion models. This implementation combines image outpainting, depth estimation, and 3D Gaussian splatting for high-quality 3D scene generation, with integrated visualization using Rerun.

Uses Rerun for 3D visualization, Gradio for interactive UI, Flux for diffusion-based outpainting, and Pixi for easy installation.

Overview

VistaDream addresses the challenge of 3D scene reconstruction from a single image through a novel two-stage pipeline:

Coarse 3D Scaffold Construction: Creates a global scene structure by outpainting image boundaries and estimating depth maps
Multi-view Consistency Sampling (MCS): Uses iterative diffusion-based RGB-D inpainting with multi-view consistency constraints to generate high-quality novel views

The framework integrates multiple state-of-the-art models:

Flux diffusion models for high-quality image outpainting and inpainting
3D Gaussian Splatting for efficient 3D scene representation
Rerun for real-time 3D visualization and debugging

Installation

Prerequisites

Linux only with NVIDIA GPU (tested with CUDA 12.9)
Pixi package manager

NOTE: You may need to change the CUDA version and CUDA compute capability in pyproject.toml (look for cuda-version and TORCH_CUDA_ARCH_LIST, respectively). You can find your CUDA version by running nvidia-smi or nvcc --version and your CUDA compute capability by running nvidia-smi --query-gpu=compute_cap --format=csv or check Nvidia webiste.

Using Pixi

git clone https://github.com/rerun-io/vistadream.git
cd vistadream
pixi run example

This will automatically download the required models and run the example with the included office image.

Usage

For the commands below you can add the --help flag to see more options, for example pixi run python tools/run_single_img.py --help.

Single Image Processing

Process a single image with depth estimation and basic 3D reconstruction:

pixi run python tools/run_single_img.py --image-path data/office/IMG_4029.jpg

Flux Outpainting Only

Run just the outpainting component with Rerun visualization:

pixi run python tools/run_flux_outpainting.py --image-path data/office/IMG_4029.jpg --expansion-percent 0.2

Multi-Image Pose & Depth Pipeline (VGGT + MoGe)

Estimate camera intrinsics/extrinsics, per-image depth, confidence masks, and fuse them into an (optionally downsampled) colored point cloud from a directory of images. Results stream live to a Rerun viewer.

pixi run python tools/run_multi_img.py --image-dir /path/to/image_folder

Connect to an already running Rerun viewer (instead of spawning a new one):

pixi run python tools/run_multi_img.py --rr-config.connect --image-dir /path/to/image_folder

Notes:

Supported image extensions: .png, .jpg, .jpeg
Automatically orients & recenters camera poses ("up" orientation heuristic) and logs a consolidated point cloud plus per‑view RGB, depth, filtered depth, MoGe depth, and confidence.
Uses VGGT (multiview geometry transformer) for joint pose & depth, robust depth confidence filtering, MoGe for refined monocular depth, and voxel downsampling to target a manageable point count.

Gradio Web Interface

Launch an interactive web interface for experimenting with the models:

pixi run python tools/gradio_app.py

Key Features

Single Image to 3D: Complete pipeline from single image to navigable 3D scene
Multi-Image Geometry: Batch multi-view camera & depth estimation with fused colored point cloud export
Memory Efficient: Model offloading support for GPU memory management
Real-time Visualization: Integrated Rerun viewer for 3D scene inspection
Training-free: No fine-tuning required for existing diffusion models
High Quality: Multi-view consistency sampling ensures coherent 3D reconstruction

Project Structure

├── src/vistadream/
│   ├── api/                 # High-level pipeline APIs
│   │   ├── flux_outpainting.py    # Outpainting-only pipeline
│   │   ├── multi_image_pipeline.py # Multi-image pose & depth fusion (VGGT + MoGe)
│   │   └── vistadream_pipeline.py # Full 3D reconstruction pipeline
│   ├── flux/                # Flux diffusion model integration
│   │   ├── cli_*.py         # Command-line interfaces
│   │   ├── model.py         # Flux transformer architecture
│   │   ├── sampling.py      # Diffusion sampling logic
│   │   └── util.py          # Model loading and configuration
│   └── ops/                 # Core operations
│       ├── flux.py          # Flux model wrappers
│       ├── gs/              # Gaussian splatting implementation
│       ├── trajs/           # Camera trajectory generation
│       └── visual_check.py  # 3D scene validation tools
└── tools/                   # Standalone applications
    ├── gradio_app.py        # Web interface
    ├── run_flux_outpainting.py
    ├── run_vistadream.py    # Main 3D pipeline
    └── run_single_img.py    # Single image processing

Model Checkpoints

Models are automatically downloaded from Hugging Face on first run. Manual download:

pixi run huggingface-cli download pablovela5620/vistadream --local-dir ckpt/

Expected structure:

ckpt/
├── flux_fill/
│   ├── flux1-fill-dev.safetensors
│   └── ae.safetensors
├── vec.pt
├── txt.pt
└── txt_256.pt

Citation

Thanks to the original authors! If you use VistaDream in your research, please cite:

Original Repo

@inproceedings{wang2025vistadream,
  title={VistaDream: Sampling multiview consistent images for single-view scene reconstruction},
  author={Wang, Haiping and Liu, Yuan and Liu, Ziwei and Wang, Wenping and Dong, Zhen and Yang, Bisheng},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}

Acknowledgements

This project builds upon several outstanding works:

Flux - Black Forest Labs for the diffusion model foundation
3D Gaussian Splatting - Inria for efficient 3D representation
Rerun - Rerun.io for 3D visualization framework
GSplat - Nerfstudio for Gaussian splatting implementation
MoGe - Microsoft Research for monocular geometry estimation

Related Work

ASUKA - Enhanced image inpainting for mitigating unwanted object insertion
MoGe - Accurate monocular geometry estimation for open-domain images

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.clinerules		.clinerules
.github		.github
.vscode		.vscode
data/office		data/office
docs		docs
media		media
notebooks		notebooks
src/vistadream		src/vistadream
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
pixi.lock		pixi.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Unofficial Implementation of VistaDream: Sampling multiview consistent images for single-view scene reconstruction

Overview

Installation

Prerequisites

Using Pixi

Usage

Single Image Processing

Flux Outpainting Only

Multi-Image Pose & Depth Pipeline (VGGT + MoGe)

Gradio Web Interface

Key Features

Project Structure

Model Checkpoints

Citation

Acknowledgements

Related Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

rerun-io/vistadream

Folders and files

Latest commit

History

Repository files navigation

Unofficial Implementation of VistaDream: Sampling multiview consistent images for single-view scene reconstruction

Overview

Installation

Prerequisites

Using Pixi

Usage

Single Image Processing

Flux Outpainting Only

Multi-Image Pose & Depth Pipeline (VGGT + MoGe)

Gradio Web Interface

Key Features

Project Structure

Model Checkpoints

Citation

Acknowledgements

Related Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages