Hotfix: Preserve ordering in get_by_ids methods across all storage implementations#2195
Merged
danielaskdd merged 5 commits intoHKUDS:mainfrom Oct 11, 2025
Merged
Conversation
…tions - Fix result ordering in vector stores - Update KV storage get_by_ids methods - Maintain order in doc status storage - Return None for missing IDs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Preserve ordering in get_by_ids methods across all storage implementations
🎯 Problem
The
get_by_idsfunction in certain storage implementations returns results in an order that does not match the input IDs list, causing a misalignment between retrieved text blocks and their corresponding IDs. This issue affects the correctness of data returned by theaquery_datafunction and the/aquery_dataAPI endpoint.📝 Changes
Modified
get_by_idsimplementations in 8 storage backends to preserve input order and handle missing IDs consistently:Modified Files:
lightrag/kg/deprecated/chroma_impl.pylightrag/kg/json_doc_status_impl.pylightrag/kg/milvus_impl.pylightrag/kg/mongo_impl.pylightrag/kg/nano_vector_db_impl.pylightrag/kg/postgres_impl.pylightrag/kg/qdrant_impl.pylightrag/kg/redis_impl.pylightrag/kg/faiss_impl.pyImplementation Pattern:
All implementations now follow a consistent 3-step pattern:
API Contract Change
Before:
After:
Impact on Consumers
✅ Compatible:
for i, result in enumerate(results)if results[i]: process(results[i])❌ Requires Updates:
for r in results: r['field']len(results) == len(found_items)Existing Code Compatibility
All 4 existing call sites in
lightrag/operate.pyalready have proper None checks:_get_cached_extraction_results(line 1304):if chunk_data and isinstance(chunk_data, dict)_get_cached_extraction_results(line 1317):if cache_entry is not None_find_related_text_unit_from_entities(line 3959):if chunk_data is not None and "content" in chunk_data_find_related_text_unit_from_relations(line 4173):if chunk_data is not None and "content" in chunk_dataConclusion: This change is backward compatible with existing codebase.
🎯 Benefits
idsorder exactlyNonevalues for missing IDs instead of silent omission| Noneunion type for better IDE support