Skip to content

Feat: Add LLM Query Cache Cleanup Tool#2335

Merged
danielaskdd merged 2 commits intoHKUDS:mainfrom
danielaskdd:llm-cache-cleanup
Nov 9, 2025
Merged

Feat: Add LLM Query Cache Cleanup Tool#2335
danielaskdd merged 2 commits intoHKUDS:mainfrom
danielaskdd:llm-cache-cleanup

Conversation

@danielaskdd
Copy link
Collaborator

Feat: Add LLM Query Cache Cleanup Tool

Overview

This PR introduces a new command-line tool for cleaning up LLM query cache entries stored in LightRAG's KV storage systems. The tool provides selective cleanup capabilities for query caches generated during RAG operations (modes: mix, hybrid, local, global).

Motivation

LLM query caches accumulate over time and can consume significant storage space. Users need a reliable way to:

  • Clean up outdated query caches across different storage backends
  • Selectively remove specific cache types (query vs keywords)
  • View statistics before and after cleanup
  • Handle cleanup errors gracefully

Implementation

Core Features

1. Multi-Storage Support

  • JsonKVStorage (file-based JSON)
  • RedisKVStorage (Redis database)
  • PGKVStorage (PostgreSQL)
  • MongoKVStorage (MongoDB)

2. Cache Type Management

  • Supports 4 query modes: mix, hybrid, local, global
  • Handles 2 cache types: query and keywords
  • Key format: <mode>:<cache_type>:<hash> (e.g., mix:query:abc123, global:keywords:def456)

3. Selective Cleanup Options

  • Delete all query caches (both query and keywords)
  • Delete query caches only (preserve keywords)
  • Delete keywords caches only (preserve query)

4. Interactive Workflow

  • Storage type selection with configuration validation
  • Pre-cleanup statistics display (count by mode and cache type)
  • Cleanup type selection with confirmation
  • Real-time progress tracking with visual progress bars
  • Post-cleanup verification and reporting

5. Robust Error Handling

  • Batch-level error tracking
  • Comprehensive error reports with type grouping
  • Success/failure statistics
  • Before/after comparison

Technical Implementation

Storage-Specific Optimizations:

  • JsonKVStorage: Batch deletion with proper update flag handling for persistence
  • RedisKVStorage: SCAN-based pattern matching with pipeline batching
  • PostgreSQL: Single optimized DELETE query with OR conditions
  • MongoDB: Regex-based deleteMany operations per pattern

Key Design Patterns:

  1. Workspace Isolation: Respects workspace configuration (storage-specific > generic > default)
  2. Memory Efficiency: Processes deletions in configurable batches (default: 1000 records)
  3. Progress Tracking: Real-time visual feedback with progress bars and statistics
  4. Error Resilience: Continues processing even if individual batches fail

Files Added

  • lightrag/tools/clean_llm_query_cache.py - Main cleanup tool implementation
  • lightrag/tools/README_CLEAN_LLM_QUERY_CACHE.md - Comprehensive user documentation

Usage Example

# Run the cleanup tool
python -m lightrag.tools.clean_llm_query_cache

# Interactive prompts guide the user through:
# 1. Storage type selection (JsonKVStorage, Redis, PostgreSQL, MongoDB)
# 2. View cache statistics by mode and type
# 3. Select cleanup scope (all, query only, keywords only)
# 4. Confirm deletion
# 5. Monitor progress and view final report

Configuration

The tool supports multiple configuration methods:

  1. Environment variables (highest priority)
  2. config.ini file (medium priority)
  3. Default values (lowest priority)

Workspace configuration:

  • Storage-specific: POSTGRES_WORKSPACE, MONGODB_WORKSPACE, REDIS_WORKSPACE
  • Generic: WORKSPACE

Testing Considerations

The tool has been designed and implemented with proper:

  • Async/await patterns for all storage operations
  • Proper lock management for JsonKVStorage
  • Update flag handling for persistence
  • Parameter type correctness for all database operations

Manual testing is recommended with actual storage systems to verify cleanup across all backends.

Related Work

This tool complements the existing migrate_llm_cache.py tool, which handles migration of extraction/summary caches (default:extract:*, default:summary:*) between storage types. The new cleanup tool focuses specifically on query caches generated during RAG operations.

- Interactive cleanup workflow
- Supports all KV storage types
- Batch deletion with progress
- Comprehensive error reporting
- Preserves workspace isolation
@danielaskdd
Copy link
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danielaskdd danielaskdd merged commit 3110ca5 into HKUDS:main Nov 9, 2025
1 check passed
@danielaskdd danielaskdd deleted the llm-cache-cleanup branch November 9, 2025 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant