Skip to content

Fix: Add file_path field to full_docs storage#2171

Merged
danielaskdd merged 1 commit intoHKUDS:mainfrom
danielaskdd:doc-name-in-full-docs
Oct 5, 2025
Merged

Fix: Add file_path field to full_docs storage#2171
danielaskdd merged 1 commit intoHKUDS:mainfrom
danielaskdd:doc-name-in-full-docs

Conversation

@danielaskdd
Copy link
Collaborator

Fix: Add file_path field to full_docs storage

Summary

This PR adds a file_path field to the full_docs storage layer, enabling better document tracking and citation capabilities. The implementation ensures compatibility across all storage backends (JSON, Redis, MongoDB, PostgreSQL) with special handling for PostgreSQL's existing doc_name column.

Relalted Issue: #2167

Changes Made

1. Core Pipeline (lightrag/lightrag.py)

Modified apipeline_enqueue_documents function:

full_docs_data = {
    doc_id: {
        "content": contents[doc_id]["content"],
        "file_path": contents[doc_id]["file_path"]  # Added
    }
    for doc_id in new_docs.keys()
}

Modified ainsert_custom_chunks function:

new_docs = {doc_key: {"content": full_text, "file_path": file_path}}

2. PostgreSQL Storage (lightrag/kg/postgres_impl.py)

SQL Template Updates:

  • upsert_doc_full: Maps file_pathdoc_name for INSERT/UPDATE operations
  • get_by_id_full_docs: Returns doc_name as file_path for consistency
  • get_by_ids_full_docs: Returns doc_name as file_path for batch operations

PGKVStorage.upsert() method:

elif is_namespace(self.namespace, NameSpace.KV_STORE_FULL_DOCS):
    _data = {
        "id": k,
        "content": v["content"],
        "doc_name": v.get("file_path", ""),  # Field mapping
        "workspace": self.workspace,
    }

3. Other Storage Implementations

No changes required for:

  • JsonKVStorage: JSON serialization automatically handles new fields
  • RedisKVStorage: JSON serialization in Redis automatically stores new fields
  • MongoKVStorage: Document-based storage automatically accommodates new fields

Technical Details

Field Mapping Strategy

PostgreSQL uses a bidirectional mapping approach:

  • Application → Database: file_pathdoc_name
  • Database → Application: doc_namefile_path

This preserves the existing PostgreSQL schema while maintaining a consistent application-layer interface.

Backward Compatibility

Fully backward compatible:

  • Existing documents without file_path will return empty string ("")
  • PostgreSQL query uses COALESCE(doc_name, '') as file_path to handle NULL values
  • No database migration required
  • All storage backends gracefully handle missing fields

Impact

  • Breaking Changes: None
  • API Changes: None (internal storage only)
  • Performance Impact: Negligible (one additional field per document)
  • Database Schema: No migration needed (uses existing PostgreSQL doc_name column)

- Store file_path in full_docs storage
- Update PostgreSQL implementation by map file_path to doc_name
- Other storage implementation automatically handles the new field
@danielaskdd danielaskdd merged commit 1b27470 into HKUDS:main Oct 5, 2025
1 check passed
@danielaskdd danielaskdd mentioned this pull request Oct 5, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant