fix: atomic writes for userdata to prevent data loss on crash#12987
fix: atomic writes for userdata to prevent data loss on crash#12987comfyanonymous merged 1 commit intomasterfrom
Conversation
Write to a temp file in the same directory then os.replace() onto the target path. If the process crashes mid-write, the original file is left intact instead of being truncated to zero bytes. Fixes #11298
📝 WalkthroughWalkthroughThe 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
app/user_manager.py (1)
387-389: Cleanup failure can mask the original exception.If
os.unlink(tmp_path)raises (e.g., permissions issue or race condition), the original exception that triggered the cleanup is lost—line 389'sraiseis never reached. Additionally, a bareexcept:catchesKeyboardInterrupt/SystemExit.Wrap the cleanup in its own try-except to ensure the original error propagates:
♻️ Proposed fix for robust cleanup
- except: - os.unlink(tmp_path) - raise + except BaseException: + try: + os.unlink(tmp_path) + except OSError: + pass # Cleanup failed; still re-raise original + raise🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/user_manager.py` around lines 387 - 389, The current bare `except:` block around the failing operation allows `KeyboardInterrupt`/`SystemExit` to be caught and also can lose the original exception if `os.unlink(tmp_path)` raises; change the handler to `except Exception as err:` (preserving the original exception in `err`), then perform cleanup in its own try/except: `try: os.unlink(tmp_path)`; `except Exception as cleanup_err:` log or swallow `cleanup_err` but do not replace `err`; finally re-raise the original `err` (e.g., `raise`) so the original exception from the protected block (not any unlink failure) always propagates; references: `tmp_path`, `os.unlink`, and the bare `except:` in the current handler.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@app/user_manager.py`:
- Around line 387-389: The current bare `except:` block around the failing
operation allows `KeyboardInterrupt`/`SystemExit` to be caught and also can lose
the original exception if `os.unlink(tmp_path)` raises; change the handler to
`except Exception as err:` (preserving the original exception in `err`), then
perform cleanup in its own try/except: `try: os.unlink(tmp_path)`; `except
Exception as cleanup_err:` log or swallow `cleanup_err` but do not replace
`err`; finally re-raise the original `err` (e.g., `raise`) so the original
exception from the protected block (not any unlink failure) always propagates;
references: `tmp_path`, `os.unlink`, and the bare `except:` in the current
handler.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f8acd8db-519d-4de5-8d07-cfd35b0028ad
📒 Files selected for processing (1)
app/user_manager.py
Summary
The
POST /userdata/{file}endpoint opens the target path with"wb"(truncating it to zero bytes immediately) then writes the body. If the process crashes between truncation and write completion, the file is left as a zero-byte file and the workflow is lost.This changes the write to use
tempfile.mkstempin the same directory followed byos.replace(), so either the old file remains intact or the new file is fully written — never a zero-byte intermediate state.Fixes #11298
Tradeoffs
mkstemp+renameper save. Negligible vs. the HTTP round-trip + JSON serialization already happening.mkstempandos.replace, an orphaned temp file is left in the directory. This is strictly better than the current behavior (losing the workflow entirely).os.replaceatomicityos.replaceis not truly atomic on NTFS but is the best available primitive. A concurrent process holding a handle (antivirus, file indexer) could cause aPermissionError, but this is the same failure mode as the current directopen("wb")— no regression.user/directory. Custom nodes reading viaGET /userdataare unaffected. Nodes writing to the same path concurrently already have no coordination — atomic writes actually improve this by preventing partial reads.Why this is not a performance concern
.index.json), which are infrequent user actions.app/assets/) already uses this sameos.replacepattern iningest.pyfor asset uploads with no reported performance issues.