Skip to content

Feat: Add Optional LLM Cache Deletion for Document Deletion#2244

Merged
danielaskdd merged 4 commits intoHKUDS:mainfrom
danielaskdd:del-doc-cache
Oct 22, 2025
Merged

Feat: Add Optional LLM Cache Deletion for Document Deletion#2244
danielaskdd merged 4 commits intoHKUDS:mainfrom
danielaskdd:del-doc-cache

Conversation

@danielaskdd
Copy link
Collaborator

Feat: Add Optional LLM Cache Deletion for Document Deletion

Summary

Implements an optional feature to delete cached LLM extraction results when deleting documents, providing users with better control over cache management and storage cleanup.


🎯 Motivation

Previously, when deleting documents, the associated LLM cache entries (entity extraction results) remained in storage indefinitely. This led to:

  • Storage bloat from orphaned cache entries
  • No cleanup path for removing extraction caches of particular documents
  • User confusion about what data remains after document deletion

This PR addresses these issues by adding an optional cache cleanup mechanism during document deletion.


🔧 Changes Made

Backend Changes

lightrag/lightrag.py

  • Added delete_llm_cache parameter to adelete_by_doc_id() method (defaults to False for backward compatibility)
  • Implemented cache ID collection from chunk data before chunk deletion
  • Added cache deletion logic after graph operations complete
  • Proper error handling with comprehensive logging
  • Pipeline status updates for user visibility

lightrag/api/routers/document_routes.py

  • Added delete_llm_cache field to DeleteDocRequest model
  • Removed unnecessary restriction that prevented deletion when LLM cache was disabled
  • Updated background_delete_documents() to pass cache deletion flag
  • Enhanced API documentation with cache cleanup details

Frontend Changes

lightrag_webui/src/api/lightrag.ts

  • Added deleteLLMCache parameter to deleteDocuments() function

lightrag_webui/src/components/documents/DeleteDocumentsDialog.tsx

  • Added checkbox for "Delete LLM cache" option
  • Proper state management for the new option
  • Reset state when dialog closes

Localization

  • Added deleteLLMCacheOption translations for 5 languages:
    • English: "Also delete extracted LLM cache"
    • Chinese (Simplified): "同时删除实体关系抽取 LLM 缓存"
    • Chinese (Traditional): "同時刪除實體關係擷取 LLM 快取"
    • French: "Supprimer également le cache LLM d'extraction"
    • Arabic: "حذف ذاكرة LLM المؤقتة للاستخراج أيضًا"

💡 Implementation Details

Cache Collection Strategy

# Collect cache IDs BEFORE deleting chunks
if delete_llm_cache and chunk_ids:
    chunk_data_list = await self.text_chunks.get_by_ids(list(chunk_ids))
    for chunk_data in chunk_data_list:
        cache_ids = chunk_data.get("llm_cache_list", [])
        # Deduplicate and collect

Deletion Timing

  1. ✅ Collect cache IDs (before chunks deleted)
  2. ✅ Delete chunks and graph elements
  3. ✅ Rebuild affected entities/relationships
  4. Delete cache entries (after graph operations)

This order ensures:

  • Cache IDs are accessible when needed
  • Graph integrity is maintained
  • Cache cleanup doesn't interfere with rebuilding

Error Handling

  • Validates storage availability before operations
  • Catches and logs errors without breaking main deletion flow
  • Provides clear feedback via pipeline status updates

🔄 Breaking Changes

None - This is a backward-compatible addition. Existing code will continue to work unchanged as delete_llm_cache defaults to False.

• Add delete_llm_cache parameter to API
• Collect cache IDs from text chunks
• Delete cache after graph operations
• Update UI with new checkbox option
• Add i18n translations for cache option
@danielaskdd danielaskdd merged commit 20edd32 into HKUDS:main Oct 22, 2025
1 check passed
@danielaskdd danielaskdd deleted the del-doc-cache branch October 22, 2025 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant