Feat: Add LLM Query Cache Cleanup Tool by danielaskdd · Pull Request #2335 · HKUDS/LightRAG

danielaskdd · 2025-11-09T05:39:33Z

Feat: Add LLM Query Cache Cleanup Tool

Overview

This PR introduces a new command-line tool for cleaning up LLM query cache entries stored in LightRAG's KV storage systems. The tool provides selective cleanup capabilities for query caches generated during RAG operations (modes: mix, hybrid, local, global).

Motivation

LLM query caches accumulate over time and can consume significant storage space. Users need a reliable way to:

Clean up outdated query caches across different storage backends
Selectively remove specific cache types (query vs keywords)
View statistics before and after cleanup
Handle cleanup errors gracefully

Implementation

Core Features

1. Multi-Storage Support

JsonKVStorage (file-based JSON)
RedisKVStorage (Redis database)
PGKVStorage (PostgreSQL)
MongoKVStorage (MongoDB)

2. Cache Type Management

Supports 4 query modes: mix, hybrid, local, global
Handles 2 cache types: query and keywords
Key format: <mode>:<cache_type>:<hash> (e.g., mix:query:abc123, global:keywords:def456)

3. Selective Cleanup Options

Delete all query caches (both query and keywords)
Delete query caches only (preserve keywords)
Delete keywords caches only (preserve query)

4. Interactive Workflow

Storage type selection with configuration validation
Pre-cleanup statistics display (count by mode and cache type)
Cleanup type selection with confirmation
Real-time progress tracking with visual progress bars
Post-cleanup verification and reporting

5. Robust Error Handling

Batch-level error tracking
Comprehensive error reports with type grouping
Success/failure statistics
Before/after comparison

Technical Implementation

Storage-Specific Optimizations:

JsonKVStorage: Batch deletion with proper update flag handling for persistence
RedisKVStorage: SCAN-based pattern matching with pipeline batching
PostgreSQL: Single optimized DELETE query with OR conditions
MongoDB: Regex-based deleteMany operations per pattern

Key Design Patterns:

Workspace Isolation: Respects workspace configuration (storage-specific > generic > default)
Memory Efficiency: Processes deletions in configurable batches (default: 1000 records)
Progress Tracking: Real-time visual feedback with progress bars and statistics
Error Resilience: Continues processing even if individual batches fail

Files Added

lightrag/tools/clean_llm_query_cache.py - Main cleanup tool implementation
lightrag/tools/README_CLEAN_LLM_QUERY_CACHE.md - Comprehensive user documentation

Usage Example

# Run the cleanup tool
python -m lightrag.tools.clean_llm_query_cache

# Interactive prompts guide the user through:
# 1. Storage type selection (JsonKVStorage, Redis, PostgreSQL, MongoDB)
# 2. View cache statistics by mode and type
# 3. Select cleanup scope (all, query only, keywords only)
# 4. Confirm deletion
# 5. Monitor progress and view final report

Configuration

The tool supports multiple configuration methods:

Environment variables (highest priority)
config.ini file (medium priority)
Default values (lowest priority)

Workspace configuration:

Storage-specific: POSTGRES_WORKSPACE, MONGODB_WORKSPACE, REDIS_WORKSPACE
Generic: WORKSPACE

Testing Considerations

The tool has been designed and implemented with proper:

Async/await patterns for all storage operations
Proper lock management for JsonKVStorage
Update flag handling for persistence
Parameter type correctness for all database operations

Manual testing is recommended with actual storage systems to verify cleanup across all backends.

Related Work

This tool complements the existing migrate_llm_cache.py tool, which handles migration of extraction/summary caches (default:extract:*, default:summary:*) between storage types. The new cleanup tool focuses specifically on query caches generated during RAG operations.

- Interactive cleanup workflow - Supports all KV storage types - Batch deletion with progress - Comprehensive error reporting - Preserves workspace isolation

danielaskdd · 2025-11-09T05:39:48Z

@codex review

chatgpt-codex-connector · 2025-11-09T05:45:18Z

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Add LLM query cache cleanup tool for KV storage backends

1485cb8

- Interactive cleanup workflow - Supports all KV storage types - Batch deletion with progress - Comprehensive error reporting - Preserves workspace isolation

Fix table alignment and add validation for empty cleanup selections

37b7118

danielaskdd merged commit 3110ca5 into HKUDS:main Nov 9, 2025
1 check passed

danielaskdd deleted the llm-cache-cleanup branch November 9, 2025 06:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Add LLM Query Cache Cleanup Tool#2335

Feat: Add LLM Query Cache Cleanup Tool#2335
danielaskdd merged 2 commits intoHKUDS:mainfrom
danielaskdd:llm-cache-cleanup

danielaskdd commented Nov 9, 2025

Uh oh!

danielaskdd commented Nov 9, 2025

Uh oh!

chatgpt-codex-connector bot commented Nov 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danielaskdd commented Nov 9, 2025