Skip to content

rerun-io/vistadream

Repository files navigation

Unofficial Implementation of VistaDream: Sampling multiview consistent images for single-view scene reconstruction

VistaDream is a novel framework for reconstructing 3D scenes from single-view images using Flux-based diffusion models. This implementation combines image outpainting, depth estimation, and 3D Gaussian splatting for high-quality 3D scene generation, with integrated visualization using Rerun.

Uses Rerun for 3D visualization, Gradio for interactive UI, Flux for diffusion-based outpainting, and Pixi for easy installation.

badge-github-stars

VistaDream 3D scene reconstruction

Overview

VistaDream addresses the challenge of 3D scene reconstruction from a single image through a novel two-stage pipeline:

  1. Coarse 3D Scaffold Construction: Creates a global scene structure by outpainting image boundaries and estimating depth maps
  2. Multi-view Consistency Sampling (MCS): Uses iterative diffusion-based RGB-D inpainting with multi-view consistency constraints to generate high-quality novel views

The framework integrates multiple state-of-the-art models:

  • Flux diffusion models for high-quality image outpainting and inpainting
  • 3D Gaussian Splatting for efficient 3D scene representation
  • Rerun for real-time 3D visualization and debugging

Installation

Prerequisites

  • Linux only with NVIDIA GPU (tested with CUDA 12.9)
  • Pixi package manager

NOTE: You may need to change the CUDA version and CUDA compute capability in pyproject.toml (look for cuda-version and TORCH_CUDA_ARCH_LIST, respectively). You can find your CUDA version by running nvidia-smi or nvcc --version and your CUDA compute capability by running nvidia-smi --query-gpu=compute_cap --format=csv or check Nvidia webiste.

Using Pixi

git clone https://github.com/rerun-io/vistadream.git
cd vistadream
pixi run example

This will automatically download the required models and run the example with the included office image.

Usage

For the commands below you can add the --help flag to see more options, for example pixi run python tools/run_single_img.py --help.

Single Image Processing

Process a single image with depth estimation and basic 3D reconstruction:

pixi run python tools/run_single_img.py --image-path data/office/IMG_4029.jpg

Flux Outpainting Only

Run just the outpainting component with Rerun visualization:

pixi run python tools/run_flux_outpainting.py --image-path data/office/IMG_4029.jpg --expansion-percent 0.2

Multi-Image Pose & Depth Pipeline (VGGT + MoGe)

Estimate camera intrinsics/extrinsics, per-image depth, confidence masks, and fuse them into an (optionally downsampled) colored point cloud from a directory of images. Results stream live to a Rerun viewer.

pixi run python tools/run_multi_img.py --image-dir /path/to/image_folder

Connect to an already running Rerun viewer (instead of spawning a new one):

pixi run python tools/run_multi_img.py --rr-config.connect --image-dir /path/to/image_folder

Notes:

  • Supported image extensions: .png, .jpg, .jpeg
  • Automatically orients & recenters camera poses ("up" orientation heuristic) and logs a consolidated point cloud plus per‑view RGB, depth, filtered depth, MoGe depth, and confidence.
  • Uses VGGT (multiview geometry transformer) for joint pose & depth, robust depth confidence filtering, MoGe for refined monocular depth, and voxel downsampling to target a manageable point count.

Gradio Web Interface

Launch an interactive web interface for experimenting with the models:

pixi run python tools/gradio_app.py

Key Features

  • Single Image to 3D: Complete pipeline from single image to navigable 3D scene
  • Multi-Image Geometry: Batch multi-view camera & depth estimation with fused colored point cloud export
  • Memory Efficient: Model offloading support for GPU memory management
  • Real-time Visualization: Integrated Rerun viewer for 3D scene inspection
  • Training-free: No fine-tuning required for existing diffusion models
  • High Quality: Multi-view consistency sampling ensures coherent 3D reconstruction

Project Structure

├── src/vistadream/
│   ├── api/                 # High-level pipeline APIs
│   │   ├── flux_outpainting.py    # Outpainting-only pipeline
│   │   ├── multi_image_pipeline.py # Multi-image pose & depth fusion (VGGT + MoGe)
│   │   └── vistadream_pipeline.py # Full 3D reconstruction pipeline
│   ├── flux/                # Flux diffusion model integration
│   │   ├── cli_*.py         # Command-line interfaces
│   │   ├── model.py         # Flux transformer architecture
│   │   ├── sampling.py      # Diffusion sampling logic
│   │   └── util.py          # Model loading and configuration
│   └── ops/                 # Core operations
│       ├── flux.py          # Flux model wrappers
│       ├── gs/              # Gaussian splatting implementation
│       ├── trajs/           # Camera trajectory generation
│       └── visual_check.py  # 3D scene validation tools
└── tools/                   # Standalone applications
    ├── gradio_app.py        # Web interface
    ├── run_flux_outpainting.py
    ├── run_vistadream.py    # Main 3D pipeline
    └── run_single_img.py    # Single image processing

Model Checkpoints

Models are automatically downloaded from Hugging Face on first run. Manual download:

pixi run huggingface-cli download pablovela5620/vistadream --local-dir ckpt/

Expected structure:

ckpt/
├── flux_fill/
│   ├── flux1-fill-dev.safetensors
│   └── ae.safetensors
├── vec.pt
├── txt.pt
└── txt_256.pt

Citation

Thanks to the original authors! If you use VistaDream in your research, please cite:

Original Repo

@inproceedings{wang2025vistadream,
  title={VistaDream: Sampling multiview consistent images for single-view scene reconstruction},
  author={Wang, Haiping and Liu, Yuan and Liu, Ziwei and Wang, Wenping and Dong, Zhen and Yang, Bisheng},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}

Acknowledgements

This project builds upon several outstanding works:

Related Work

  • ASUKA - Enhanced image inpainting for mitigating unwanted object insertion
  • MoGe - Accurate monocular geometry estimation for open-domain images

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published