TorchCompileModel_LoRASafe is a drop-in replacement for ComfyUI's stock
TorchCompileModel node, designed for workflows that rely on model patching
(e.g., LoRA stacks, TEA-Cache, Sage-Attention, and related runtime patches).
The main difference from the stock node is when compilation is applied: this node compiles after patching so your patches stay active instead of being silently bypassed.
In many optimized ComfyUI workflows, the diffusion model is patched before sampling starts. If compile wrapping happens at the wrong stage, you can end up compiling a pre-patch object and losing expected patch behavior.
This node is built to avoid that failure mode:
- Patch first
- Compile lazily on first real execution
- Keep patched behavior intact while still benefiting from
torch.compile
- LoRA-safe compile flow for patched diffusion models
- Lazy compilation (first forward pass triggers compile)
- Whole-model compile by default
- Transformer-only compile mode for architectures where compiling only block lists is more stable or faster to warm up
- ComfyUI-aligned stability behavior: attempts
model.clone(disable_dynamic=True)and (when available) uses ComfyUI'sguard_filter_fnto avoid unnecessary guards on dynamictransformer_optionsdictionaries - Backend support:
inductorcudagraphs(CUDA required; generally best on newer GPUs)nvfuser(CUDA required)
This node includes Flux-specific compile target discovery intended to improve compatibility with FLUX.2 KLEIN 9B and other Flux-style workflows.
For Flux-style models in ComfyUI, block forward passes are highly dynamic
(transformer_options mutation, dynamic patch replacement, and patch hooks
around attention). Because of that, this node does not compile whole
double_blocks.{i} / single_blocks.{i} forwards.
Instead, when compile_transformer_only = true, Flux targets strict leaf
submodules by default.
Strict Flux names targeted by default:
double_blocks.{i}.img_attn.qkvdouble_blocks.{i}.img_attn.projdouble_blocks.{i}.txt_attn.qkvdouble_blocks.{i}.txt_attn.projsingle_blocks.{i}.linear1single_blocks.{i}.linear2
For cudagraph-capable Flux runs (backend = cudagraphs, or backend = inductor
with disable_cudagraphs = false), the behavior splits into two safe modes:
compile_transformer_only = true- target set is narrowed to the cudagraph-safe qkv subset only:
double_blocks.{i}.img_attn.qkvdouble_blocks.{i}.txt_attn.qkv
- target set is narrowed to the cudagraph-safe qkv subset only:
compile_transformer_only = false- the node does not compile the outer
diffusion_modelwrapper path. - instead, it widens to a Flux full-leaf compile across the model's parameterized leaf modules.
- the node does not compile the outer
This avoids full-model cudagraph capture through Flux / ComfyUI
WrapperExecutor paths, which are prone to graph-break / replay failures when
runtime patches such as Sage attention are active.
In cudagraph-capable Flux mode, inferred Flux targets and full-model fallback are intentionally disabled to avoid widening back into unstable outer-wrapper paths.
Optional widening controls:
-
allow_flux_inferred_targets(falseby default)false: use only strict names above.true: allow inferred Flux leaf targets (includingimg_mlp/txt_mlppattern matches) for compatibility across variants.- Ignored for cudagraph-capable Flux runs, which stay on the strict qkv-only set.
-
fallback_to_full_model_if_no_targets(falseby default)false: if transformer-only target discovery finds nothing, skip compile instead of silently widening scope.true: if discovery finds nothing, fall back to compilingdiffusion_model.- Ignored for cudagraph-capable Flux runs, which skip compile instead of widening scope.
For non-Flux transformer-style models, it compiles known block containers such as:
layerstransformer_blocksblocksvisual_transformer_blockstext_transformer_blocks
If no known transformer compile targets are found, behavior is controlled by
fallback_to_full_model_if_no_targets (default false, meaning skip compile).
- Clone or download this repository.
- Copy the
lora_safe_compilefolder into your ComfyUIcustom_nodes/directory. - Restart ComfyUI.
- Search for TorchCompileModel_LoRASafe in: model / optimisation 🛠️
-
model(MODEL)- The model object to patch and compile.
-
backend(inductor,cudagraphs,nvfuser)inductor: default recommendation for most users.cudagraphs: CUDA-only; can reduce overhead in stable-shape loops.nvfuser: CUDA-only; may vary by PyTorch/CUDA environment.
-
mode(default,reduce-overhead,max-autotune)- Passed through to
torch.compilemode behavior.
- Passed through to
-
fullgraph(BOOLEAN)- Requests full-graph capture. Can improve performance in some setups, but may reduce tolerance for graph breaks.
-
dynamic(BOOLEAN)- Enables dynamic-shape aware compilation behavior.
- Note: the node still attempts
model.clone(disable_dynamic=True)first to match ComfyUI stock compile behavior when supported by the model object.
-
disable_cudagraphs(BOOLEAN, defaulttrue)- For
backend = inductor, addstorch.compile(..., options={'triton.cudagraphs': False}). - Not applied to the
cudagraphsbackend. - Recommended if you hit
cudaMallocAsync/ cudagraph instability in Inductor-based runs. - Set to
falseonly when you specifically want Inductor cudagraphs.
- For
-
compile_transformer_only(BOOLEAN)false(default): compile the entirediffusion_modelfor non-Flux models.true: compile discovered transformer targets only.- Flux behavior (default strict mode): targets only
qkv/projattention leaves pluslinear1/linear2leaves. - For cudagraph-capable Flux runs:
true-> narrowed further toimg_attn.qkvandtxt_attn.qkvonly.false-> widens to a Flux full-leaf compile instead of compiling the outerdiffusion_modelwrapper path.
-
allow_flux_inferred_targets(BOOLEAN, defaultfalse)false(recommended): strict Flux target set only.true: allow inferred Flux leaf targets (img_mlp/txt_mlp-style matches) when preferred names differ.- Ignored for cudagraph-capable Flux runs, which stay on the strict qkv-only set.
-
fallback_to_full_model_if_no_targets(BOOLEAN, defaultfalse)false(recommended): if transformer-only discovery finds nothing, skip compile rather than widening scope.true: if transformer-only discovery finds nothing, compilediffusion_model.- Ignored for cudagraph-capable Flux runs, which skip compile instead of widening scope.
If you're unsure where to begin:
- Backend:
inductor disable_cudagraphs:true(recommended default for stability). Set tofalseonly when you specifically want Inductor cudagraphs.- Mode:
default fullgraph:falsedynamic:falsecompile_transformer_only:truefor FLUX-style models (including FLUX.2 KLEIN 9B); otherwise start withfalseand compare.allow_flux_inferred_targets:false(recommended for debugging and stable targeting).fallback_to_full_model_if_no_targets:false(recommended to avoid silent whole-model escalation).
Then iterate one setting at a time based on stability and speed.
-
First generation after graph changes can be slower (compile warm-up).
-
Recompiles can occur when shapes/settings change substantially.
-
Best backend can differ across GPUs, drivers, and PyTorch versions.
-
If a backend fails, switch to
inductorfirst. -
If you encounter
cudaMallocAsync/cudagraph issues, keepdisable_cudagraphs = true. -
For cudagraph-enabled Flux validation, start with qkv-only targeting and do not widen to
projunless logs prove the path is stable. -
Guard filtering: when ComfyUI's
skip_torch_compile_dictis available, this node passes it asoptions['guard_filter_fn']to reduce guard churn on dynamictransformer_optionsdictionaries. -
If full-model compile is unstable on your graph, enable
compile_transformer_only.
-
Issue: Backend error on startup or first sample.
- Try
backend = inductor. - Keep
disable_cudagraphs = true.
- Try
-
Issue: Very long first run.
- Expected in compile warm-up; test subsequent runs before evaluating.
-
Issue: Unstable behavior with full model compile.
- Set
compile_transformer_only = true.
- Set
-
Issue: CUDA backend unavailable.
- Use
inductor, or verify CUDA-compatible PyTorch/GPU environment.
- Use