Refact: Limit Vector Database Metadata Size#2240
Merged
danielaskdd merged 29 commits intoHKUDS:mainfrom Oct 21, 2025
Merged
Conversation
- Add entity_chunks & relation_chunks storage - Implement KEEP/FIFO limit strategies - Update env.example with new settings - Add migration for chunk tracking data - Support all KV storage
• Add MAX_FILE_PATHS env variable • Implement file path count limiting • Support KEEP/FIFO strategies • Add truncation placeholder • Remove old build_file_path function
• Add truncate tooltip to source_id field • Add visual truncation indicator (†) • Bump API version to 0242
• Add has_placeholder tracking variable • Detect placeholder patterns in paths • Show + sign for truncated counts
…unctions • Move VDB upserts into merge functions • Fix early return data structure issues • Update status messages (IGNORE_NEW → KEEP) • Consolidate error handling paths • Improve relationship content format
• Entity source IDs: 3 → 300 • Relation source IDs: 3 → 300 • File paths: 2 → 30
• Add numbered steps for clarity • Improve early return handling • Enhance file path limiting logic
• Standardize FIFO/KEEP truncation labels • Update UI truncation text format
• Use proper Redis connection context • Fix namespace pattern for key scanning • Propagate storage check exceptions • Remove defensive error swallowing
- Bump DEFAULT_MAX_FILE_PATHS to 100 - Add clarifying comment about display
Collaborator
Author
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refact: Limit Vector Database Metadata Size
Problem Statement
In production deployments, entity and relation metadata can grow unbounded as documents are continuously ingested. The
source_id(chunk IDs) andfile_pathfields in entities and relations can accumulate thousands of entries, leading to:Solution Overview
This PR implements a configurable metadata size control system with two key features:
Both features support two strategies:
Key Features
1. Chunk Tracking System
entity_chunksandrelation_chunks2. Configurable Limits
New environment variables:
3. Visual Indicators
4. Storage Backend Support
Implemented across all storage backends:
5. Enhanced Logging
Configuration Details
Default Values
Limit Strategies
Breaking Changes
None. The changes are fully backward compatible:
Performance Impact
Positive impacts:
Considerations:
Migration Path
For existing deployments:
.envwith new configuration variables (optional)