Updated on DEC 7, 2023. Device: RTX 3090
| SDXL1.0-base (1024x1024) | torch(Baseline) | onediff(Optimized) | Percentage improvement |
|---|---|---|---|
| Stable Diffusion workflow(UNet) | 4.08it/s | 6.13it/s | 50.25% |
| LoRA workflow | 4.05it/s | 6.14it/s | 51.60% |
-
Install and set up ComfyUI
-
Install PyTorch and OneFlow
Install PyTorch:
pip install torch torchvision torchaudioInstall OneFlow Community(CUDA 11.x)
pip install --pre oneflow -f https://oneflow-pro.oss-cn-beijing.aliyuncs.com/branch/community/cu118Install OneFlow Community(CUDA 12.x)
pip install --pre oneflow -f https://oneflow-pro.oss-cn-beijing.aliyuncs.com/branch/community/cu121- Intall onediff
https://github.com/Oneflow-Inc/onediff.git
cd onediff && pip install -e .- Install onediff_comfy_nodes for ComfyUI
cd onediff
cp -r onediff_comfy_nodes path/to/ComfyUI/custom_nodes/- (Optional) Advanced features
If you need unrestricted multiple resolution, quantization, dynamic batchsize support or any other more advanced features, please send an email to caishenghang@oneflow.org . Tell us about your use case, deployment scale and requirements!
Note All the images in this section can be loaded directly into ComfyUI.
-
OneDiff Community
-
Customized Features
- Customized node for customized pipeline
- Quantization
- Quantized Model Saver
- Quantized Model Loader
- (zero overhead to switch input shape)multiple resolutions and dynamic batch size
- contact us for more details email: caishenghang@oneflow.org
The "Model Speedup" node takes a model as input and outputs an optimized model.
If the static_mode is enabled (which is the default), it will take some time to compile before the first inference.
If static_model is disabled, there is no need for additional compilation time before the first inference, but the inference speed will be slower compared to enabled, albeit slightly.
The optimized model from the "Model Speedup" node can be saved to "graph" by the "Model Graph Saver" node, allowing it to be used in other scenarios without the need for recompilation.
You can set different file name prefixes for different types of models.
The "Model Graph Loader" node is used to load graph files from the disk, thus saving the time required for the initial compilation.
Note: Quantization feature is only supported in OneDiff Enterprise.
The "UNet Loader Int8" node is used to load quantized models. Quantized models need to be used in conjunction with the "Model Speedup" node.
The compilation result of the quantized model can also be saved as a graph and loaded when needed.
The VAE nodes used for accelerating, saving, and loading VAE graphs operate in a manner very similar to the usage of Model nodes.
Omitting specific details here, the following workflow can be loaded and tested.
VAE Speedup and Graph Saver
VAE Speedup and Graph Loader
Similar to the usage of "Model Speedup" nodes, it's used to accelerate the Stable Video Diffusion (SVD) model, completing the acceleration of the text-to-video pipeline.
Compatible with "Model Graph Loader" node and "Model Graph Saver" node.
Omitting specific details here, the following workflow can be loaded and tested.
This node can further accelerate onediff's SVD using DeepCache.
Omitting specific details here, the following workflow can be loaded and tested.
The "Image Distinction Scanner" node is used to compare the differences between two images and visualize the resulting variances.