Skip to content

IGL-HKUST/TrackingWorld

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[NeurIPS 2025]TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels

Version      

Authors: Jiahao Lu, Weitao Xiong, Jiacheng Deng, Peng Li, Tianyu Huang, Zhiyang Dou, Cheng Lin, Sai-Kit Yeung, Yuan Liu NeurIPS, 2025


TrackingWorld is a novel approach for dense, world-centric 3D tracking from monocular videos. Our method estimates accurate camera poses and disentangles 3D trajectories of both static and dynamic components — not limited to a single foreground object. It supports dense tracking of nearly all pixels, enabling robust 3D scene understanding from monocular inputs.


🖼️ Teaser

Watch the teaser


⚙️ Setup and Installation

TrackingWorld relies on several visual foundation model repositories included as submodules for comprehensive preprocessing.

1. Cloning the Repository

Use the --recursive flag to clone the main repository and all necessary submodules:

git clone --recursive https://github.com/IGL-HKUST/TrackingWorld.git
cd TrackingWorld

2. Environment Setup

An installation script is provided and tested with CUDA Toolkit 12.1 and Python 3.10.

conda create -n trackingworld python=3.10
conda activate trackingworld
bash scripts/install.sh

3. Downloading Weights

Download the necessary model weights for the visual foundation models used in the pipeline:

bash scripts/download.sh

4. OpenAI API Key (For Preprocessing)

Our initial preprocessing involves using GPT via the OpenAI API (minimal credit usage expected). Please set your API key as an environment variable in a .env file:

echo "OPENAI_API_KEY=sk-your_api_key_here" > .env

Find your API key here.


🚀 Demo

We've included the dog sequence from the DAVIS dataset as a demonstration. You can run the entire processing pipeline using the following convenience script:

bash scripts/demo.sh

📁 Output Structure

The demo generates a comprehensive set of intermediate and final results within the data/demo_data/ directory. The files showcase the progression from foundational model outputs to the final 4D representation. You can also download a preprocessed version of the results here.

data/demo_data/
└── dog/                               # 🐾 Demo Sequence Name (e.g., DAVIS 'dog')
    ├── color/                         # Original RGB Images
    │   └── 00000.jpg, ...             # Sequential RGB frames
    │
    ├── deva/                          # DEVA Model Outputs (Video Segmentation)
    │   └── pred.json, Annotations/, ...
    │
    ├── ram/                           # RAM Model Outputs (Image Tagging)
    │   └── tags.json                  # Contains RAM tags, GPT filtering results, and detected classes
    │
    ├── unidepth/                      # Depth Estimation Results
    │   ├── depth.npy                  # Raw depth maps
    │   └── intrinsics.npy             # Camera intrinsic parameters
    │
    ├── gsm2/                          # GSM2 Model Outputs (Instance/Semantic Segmentation)
    │   └── mask/, vis/, ...
    │
    ├── densetrack3d_efep/             # DenseTrack3D / CoTracker Outputs
    │   └── results.npz                # Dense tracklet data
    │
    └── uni4d/                         # Final Uni4D Reconstruction Outputs
        └── experiment_name/           # Experiment Name (e.g., base_delta_ds2)
            ├── fused_track_4d_full.npz    # 🔑 Fused 4D Representation (Main Output)
            └── training_info.log          # Training metadata

✨ Visualization

To visualize the dense 4D trajectories and the reconstructed scene, run the provided visualization script, pointing it to the main output file:

python visualizer/vis_trackingworld.py --filepath data/demo_data/dog/uni4d/base_delta_ds2/fused_track_4d_full.npz

This visualization helps interpret the world-centric motion and disentangled trajectories generated by TrackingWorld.

Watch the video

📜 To-Do List

We plan to release more features and data soon.

  • Release demo code
  • Provide evaluation benchmark and metrics

📝 Citation

If you find TrackingWorld useful for your research or applications, please consider citing our paper:

@inproceedings{
    lu2025trackingworld,
    title={TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels},
    author={Jiahao Lu and Weitao Xiong and Jiacheng Deng and Peng Li and Tianyu Huang and Zhiyang Dou and Cheng Lin and Sai-Kit Yeung and Yuan Liu},
    booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
    year={2025},
    url={[https://openreview.net/forum?id=vDV912fa3t](https://openreview.net/forum?id=vDV912fa3t)}
}

🤝 Acknowledgements

Our codebase is based on Uni4D. Our preprocessing relies on DELTA, CotrackerV3, Unidepth, Tracking-Anything-with-DEVA, Grounded-Sam-2, and Recognize-Anything. We thank the authors for their excellent work!

About

[NeurIPS 25] TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published