fix: atomic writes for userdata to prevent data loss on crash by christian-byrne · Pull Request #12987 · Comfy-Org/ComfyUI

christian-byrne · 2026-03-16T05:05:55Z

Summary

The POST /userdata/{file} endpoint opens the target path with "wb" (truncating it to zero bytes immediately) then writes the body. If the process crashes between truncation and write completion, the file is left as a zero-byte file and the workflow is lost.

This changes the write to use tempfile.mkstemp in the same directory followed by os.replace(), so either the old file remains intact or the new file is fully written — never a zero-byte intermediate state.

Fixes #11298

Tradeoffs

Concern	Assessment
Extra syscalls	One additional `mkstemp` + `rename` per save. Negligible vs. the HTTP round-trip + JSON serialization already happening.
Temp file cleanup on crash	If the process dies between `mkstemp` and `os.replace`, an orphaned temp file is left in the directory. This is strictly better than the current behavior (losing the workflow entirely).
Windows `os.replace` atomicity	`os.replace` is not truly atomic on NTFS but is the best available primitive. A concurrent process holding a handle (antivirus, file indexer) could cause a `PermissionError`, but this is the same failure mode as the current direct `open("wb")` — no regression.
Custom node ecosystem	No backend hooks or file watchers exist on the `user/` directory. Custom nodes reading via `GET /userdata` are unaffected. Nodes writing to the same path concurrently already have no coordination — atomic writes actually improve this by preventing partial reads.

Why this is not a performance concern

Autosave is off by default. When enabled, it is debounced at a minimum of 1000ms with an in-flight guard that serializes writes — this path cannot fire more than ~1x/sec regardless of edit rate.
Manual saves are human-rate-limited (Ctrl+S).
The only non-debounced writes through this path are bookmark toggles (.index.json), which are infrequent user actions.
The assets system (app/assets/) already uses this same os.replace pattern in ingest.py for asset uploads with no reported performance issues.

Write to a temp file in the same directory then os.replace() onto the target path. If the process crashes mid-write, the original file is left intact instead of being truncated to zero bytes. Fixes #11298

coderabbitai · 2026-03-16T05:09:20Z

📝 Walkthrough

Walkthrough

The post_userdata function in user_manager.py has been refactored to implement atomic file writing. The implementation creates a temporary file in the target directory, writes data to it, and then atomically replaces the original file using os.replace. The temporary file is explicitly cleaned up on operation failure. This change adds the tempfile module import and modifies the write operation logic without altering any public function signatures.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: implementing atomic writes using a temporary file pattern to prevent data loss during crashes.
Description check	✅ Passed	The description clearly explains the problem (truncation + crash = zero-byte file), the solution (atomic writes via tempfile + os.replace), and addresses tradeoffs comprehensively.
Linked Issues check	✅ Passed	The PR implements the exact solution requested in `#11298`: writing to a temporary file first, then atomically replacing the original, preventing zero-byte file loss on crash.
Out of Scope Changes check	✅ Passed	All changes are in-scope: the modified post_userdata function in app/user_manager.py implements atomic write handling as specified in `#11298` with no extraneous changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

app/user_manager.py (1)

387-389: Cleanup failure can mask the original exception.

If os.unlink(tmp_path) raises (e.g., permissions issue or race condition), the original exception that triggered the cleanup is lost—line 389's raise is never reached. Additionally, a bare except: catches KeyboardInterrupt/SystemExit.

Wrap the cleanup in its own try-except to ensure the original error propagates:

♻️ Proposed fix for robust cleanup

-            except:
-                os.unlink(tmp_path)
-                raise
+            except BaseException:
+                try:
+                    os.unlink(tmp_path)
+                except OSError:
+                    pass  # Cleanup failed; still re-raise original
+                raise

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@app/user_manager.py` around lines 387 - 389, The current bare `except:` block
around the failing operation allows `KeyboardInterrupt`/`SystemExit` to be
caught and also can lose the original exception if `os.unlink(tmp_path)` raises;
change the handler to `except Exception as err:` (preserving the original
exception in `err`), then perform cleanup in its own try/except: `try:
os.unlink(tmp_path)`; `except Exception as cleanup_err:` log or swallow
`cleanup_err` but do not replace `err`; finally re-raise the original `err`
(e.g., `raise`) so the original exception from the protected block (not any
unlink failure) always propagates; references: `tmp_path`, `os.unlink`, and the
bare `except:` in the current handler.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@app/user_manager.py`:
- Around line 387-389: The current bare `except:` block around the failing
operation allows `KeyboardInterrupt`/`SystemExit` to be caught and also can lose
the original exception if `os.unlink(tmp_path)` raises; change the handler to
`except Exception as err:` (preserving the original exception in `err`), then
perform cleanup in its own try/except: `try: os.unlink(tmp_path)`; `except
Exception as cleanup_err:` log or swallow `cleanup_err` but do not replace
`err`; finally re-raise the original `err` (e.g., `raise`) so the original
exception from the protected block (not any unlink failure) always propagates;
references: `tmp_path`, `os.unlink`, and the bare `except:` in the current
handler.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f8acd8db-519d-4de5-8d07-cfd35b0028ad

📥 Commits

Reviewing files that changed from the base of the PR and between 593be20 and 499abac.

📒 Files selected for processing (1)

app/user_manager.py

fix: atomic writes for userdata to prevent data loss on crash

499abac

Write to a temp file in the same directory then os.replace() onto the target path. If the process crashes mid-write, the original file is left intact instead of being truncated to zero bytes. Fixes #11298

christian-byrne requested review from Kosinkadink, comfyanonymous and guill as code owners March 16, 2026 05:05

christian-byrne assigned guill and Kosinkadink Mar 16, 2026

coderabbitai bot reviewed Mar 16, 2026

View reviewed changes

comfyanonymous merged commit 9a870b5 into master Mar 17, 2026
15 checks passed

comfyanonymous deleted the fix/atomic-userdata-writes branch March 17, 2026 01:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: atomic writes for userdata to prevent data loss on crash#12987

fix: atomic writes for userdata to prevent data loss on crash#12987
comfyanonymous merged 1 commit intomasterfrom
fix/atomic-userdata-writes

christian-byrne commented Mar 16, 2026

Uh oh!

coderabbitai bot commented Mar 16, 2026

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

christian-byrne commented Mar 16, 2026

Summary

Tradeoffs

Why this is not a performance concern

Uh oh!

coderabbitai bot commented Mar 16, 2026

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants