Skip to content

Fix: Remove Duplicate Entity/Realtion Tracking Deletion in adelete_by_doc_id#2322

Merged
danielaskdd merged 1 commit intoHKUDS:mainfrom
danielaskdd:fix-delete
Nov 6, 2025
Merged

Fix: Remove Duplicate Entity/Realtion Tracking Deletion in adelete_by_doc_id#2322
danielaskdd merged 1 commit intoHKUDS:mainfrom
danielaskdd:fix-delete

Conversation

@danielaskdd
Copy link
Collaborator

🐛 Fix: Remove Duplicate Entity/Realtion Tracking Deletion in adelete_by_doc_id

Problem

In the adelete_by_doc_id function, there were two separate calls to relation_chunks.delete that could potentially delete the same relation chunk tracking records:

  1. First deletion (around line 2656): During batch chunk tracking updates, relations with no remaining chunks were deleted
  2. Second deletion (around line 2729): When deleting relationship graph edges, corresponding chunk tracking records were deleted again

While this didn't cause errors (delete operations are idempotent), it resulted in:

  • Unnecessary performance overhead
  • Code redundancy and reduced maintainability
  • Unclear separation of concerns

Solution

Refactored the chunk tracking update logic to eliminate duplicate deletions:

Changes in the first section (Chunk Tracking Updates):

  • Removed the relation_delete_ids set and its associated deletion logic
  • Modified the loop to skip empty relations using continue instead of deleting them
  • Added clear comment: "Empty relations are deleted alongside graph edges later"
  • Now only performs upsert operations for relations with remaining chunks

Preserved the second section (Graph Element Deletion):

  • Kept this as the single source of truth for deleting relation chunk tracking records
  • Ensures consistent deletion across all storage layers (chunk tracking, vector DB, graph DB)

Note: The same optimization pattern was applied to entity_chunks for code consistency.

Benefits

  • Eliminates redundant operations: No duplicate deletion attempts
  • Improves performance: Reduces unnecessary async operations
  • Clearer code logic: Single responsibility - updates in one place, deletions in another
  • Better maintainability: Clear comments explain the design rationale
  • Consistent pattern: Applied the same optimization to entity_chunks for consistency

Testing Recommendations

  1. Test document deletion with relations that:
    • Are fully deleted (all chunks removed)
    • Are partially affected (some chunks remain)
    • Have no chunk references
  2. Verify chunk tracking storage consistency after deletion
  3. Confirm no regression in deletion functionality

Impact

  • Scope: Document deletion workflow (adelete_by_doc_id)
  • Risk: Low - logic simplification with same functional outcome
  • Backward compatibility: ✅ Fully compatible - no API changes

@danielaskdd
Copy link
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danielaskdd danielaskdd merged commit 366a1e0 into HKUDS:main Nov 6, 2025
1 check passed
@danielaskdd danielaskdd deleted the fix-delete branch November 7, 2025 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant