Hotfix: Preserve ordering in get_by_ids methods across all storage implementations by danielaskdd · Pull Request #2195 · HKUDS/LightRAG

danielaskdd · 2025-10-11T04:45:55Z

Preserve ordering in get_by_ids methods across all storage implementations

🎯 Problem

The get_by_ids function in certain storage implementations returns results in an order that does not match the input IDs list, causing a misalignment between retrieved text blocks and their corresponding IDs. This issue affects the correctness of data returned by the aquery_data function and the /aquery_data API endpoint.

📝 Changes

Modified get_by_ids implementations in 8 storage backends to preserve input order and handle missing IDs consistently:

Modified Files:

lightrag/kg/deprecated/chroma_impl.py
lightrag/kg/json_doc_status_impl.py
lightrag/kg/milvus_impl.py
lightrag/kg/mongo_impl.py
lightrag/kg/nano_vector_db_impl.py
lightrag/kg/postgres_impl.py
lightrag/kg/qdrant_impl.py
lightrag/kg/redis_impl.py
lightrag/kg/faiss_impl.py

Implementation Pattern:

All implementations now follow a consistent 3-step pattern:

# 1. Fetch data from storage
results = await storage.find({"_id": {"$in": ids}})

# 2. Build lookup map
result_map: dict[str, dict[str, Any]] = {}
for result in results:
    result_map[str(result["_id"])] = result

# 3. Preserve input order with None for missing IDs
ordered_results: list[dict[str, Any] | None] = []
for id_value in ids:
    ordered_results.append(result_map.get(str(id_value)))

return ordered_results

⚠️ Breaking Changes

API Contract Change

Before:

get_by_ids([1, 2, 3]) → [{id:1}, {id:3}]  # Missing ID omitted
len(result) may be < len(ids)

After:

get_by_ids([1, 2, 3]) → [{id:1}, None, {id:3}]  # None for missing IDs
len(result) == len(ids) always

Impact on Consumers

✅ Compatible:

Code using index-based iteration: for i, result in enumerate(results)
Code checking individual results: if results[i]: process(results[i])

❌ Requires Updates:

Code assuming all results are non-None: for r in results: r['field']
Code assuming len(results) == len(found_items)

Existing Code Compatibility

All 4 existing call sites in lightrag/operate.py already have proper None checks:

✅ _get_cached_extraction_results (line 1304): if chunk_data and isinstance(chunk_data, dict)
✅ _get_cached_extraction_results (line 1317): if cache_entry is not None
✅ _find_related_text_unit_from_entities (line 3959): if chunk_data is not None and "content" in chunk_data
✅ _find_related_text_unit_from_relations (line 4173): if chunk_data is not None and "content" in chunk_data

Conclusion: This change is backward compatible with existing codebase.

🎯 Benefits

Predictable Order: Results match input ids order exactly
1:1 Correspondence: Easy to map results back to requests
Consistent Behavior: All storage backends behave identically
Missing ID Handling: Explicit None values for missing IDs instead of silent omission
Type Safety: Clear | None union type for better IDE support

…tions - Fix result ordering in vector stores - Update KV storage get_by_ids methods - Maintain order in doc status storage - Return None for missing IDs

danielaskdd added 5 commits October 11, 2025 12:37

Preserve ordering in get_by_ids methods across all storage implementa…

9be22dd

…tions - Fix result ordering in vector stores - Update KV storage get_by_ids methods - Maintain order in doc status storage - Return None for missing IDs

Revert core version to 1.4.9..2

7cddd56

Update pymilvus to >=2.6.2 and add protobuf compatibility constraint

49197fb

Fix linting

b7216ed

Fix get_by_ids to return None for missing records consistently

e1e4f1b

danielaskdd merged commit 8239783 into HKUDS:main Oct 11, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hotfix: Preserve ordering in get_by_ids methods across all storage implementations#2195

Hotfix: Preserve ordering in get_by_ids methods across all storage implementations#2195
danielaskdd merged 5 commits intoHKUDS:mainfrom
danielaskdd:hotfix-postgres

danielaskdd commented Oct 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danielaskdd commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Preserve ordering in get_by_ids methods across all storage implementations

🎯 Problem

📝 Changes

Modified Files:

Implementation Pattern:

⚠️ Breaking Changes

API Contract Change

Impact on Consumers

Existing Code Compatibility

🎯 Benefits

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

danielaskdd commented Oct 11, 2025 •

edited

Loading