Skip to content
Merged
Show file tree
Hide file tree
Changes from 90 commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
5c10d3d
feat: enhance EmbeddingFunc with model_name support
BukeLy Nov 18, 2025
13f2440
feat: enhance BaseVectorStorage for model isolation
BukeLy Nov 18, 2025
df5aacb
feat: Qdrant model isolation and auto-migration
BukeLy Nov 19, 2025
ad68624
feat: PostgreSQL model isolation and auto-migration
BukeLy Nov 19, 2025
7dc1f83
fix: PostgreSQL read methods and delete_entity_relation bugs
BukeLy Nov 19, 2025
a0dfb47
docs: add multi-model vector storage isolation demo
BukeLy Nov 19, 2025
4c12301
fix: correct parameter passing in delete_entity_relation
BukeLy Nov 19, 2025
209dadc
ci: add feature branch testing workflow
BukeLy Nov 19, 2025
c32e6a4
test: add E2E tests with real PostgreSQL and Qdrant services
BukeLy Nov 19, 2025
d89849c
fix: E2E test fixture scope mismatch
BukeLy Nov 19, 2025
47fd7ea
fix: add required connection retry configs to E2E tests
BukeLy Nov 19, 2025
dc20615
test: refactor E2E tests using complete LightRAG instances
BukeLy Nov 19, 2025
c7e7b34
test: add Qdrant legacy migration E2E test
BukeLy Nov 19, 2025
66a0dfe
fix: resolve E2E test failures in CI
BukeLy Nov 19, 2025
722f639
fix: remove Qdrant health check in E2E workflow
BukeLy Nov 19, 2025
01bdaac
refactor: optimize batch insert handling in PGVectorStorage
BukeLy Nov 19, 2025
38f41da
fix: remove non-existent storage kwargs in E2E tests
BukeLy Nov 19, 2025
bef7577
fix: correct PostgreSQL environment variable name in E2E workflow
BukeLy Nov 19, 2025
6737ec0
fix: improve Qdrant wait strategy in E2E tests
BukeLy Nov 19, 2025
bf176b3
fix: correct attribute access in E2E tests
BukeLy Nov 19, 2025
519f7f6
fix: handle wrapped embedding_func and lock flag logic
BukeLy Nov 19, 2025
fa7a43a
fix: preserve EmbeddingFunc object in global_config
BukeLy Nov 19, 2025
5d95473
fix: correct Qdrant legacy_namespace for data migration
BukeLy Nov 19, 2025
e842327
fix: replace db.fetch with db.query for PostgreSQL migration
BukeLy Nov 19, 2025
e9f6ced
fix: use NetworkXStorage for E2E tests (AGE extension not available i…
BukeLy Nov 19, 2025
088b986
style: fix lint issues (trailing whitespace and formatting)
BukeLy Nov 19, 2025
65ff9b3
style: fix lint errors in E2E test file
BukeLy Nov 19, 2025
6bef407
style: fix lint errors (trailing whitespace and formatting)
BukeLy Nov 19, 2025
3979095
feat: implement vector storage model isolation and legacy migration
BukeLy Nov 19, 2025
df7a8f2
fix: add backward compatibility for Qdrant legacy collection detection
BukeLy Nov 19, 2025
19caf9f
test: add comprehensive E2E migration tests for Qdrant and complete u…
BukeLy Nov 19, 2025
84ff11f
fix: add safety check for empty model_suffix in PostgreSQL vector sto…
BukeLy Nov 19, 2025
42df825
fix: handle empty model_suffix in Qdrant collection naming
BukeLy Nov 19, 2025
7d0c356
fix: correct assert syntax in test_empty_model_suffix to prevent fals…
BukeLy Nov 19, 2025
982b63c
fix: correct AsyncPG parameter passing in PostgreSQL migration to pre…
BukeLy Nov 19, 2025
0508ad7
fix: prevent offline tests from failing due to missing E2E dependencies
BukeLy Nov 19, 2025
d12c149
chore: remove internal analysis document from PR
BukeLy Nov 19, 2025
8d9b6a6
fix: use actual embedding_dim instead of environment variable
BukeLy Nov 19, 2025
e24b2ed
fix: Prioritize workspace-specific legacy collections in Qdrant migra…
BukeLy Nov 19, 2025
48f6511
style: Apply ruff-format to qdrant_impl.py
BukeLy Nov 19, 2025
cedb3d4
fix: pass workspace to LightRAG instance instead of vector_db_storage…
BukeLy Nov 19, 2025
b29f32b
fix: correct PostgreSQL migration parameter passing
BukeLy Nov 19, 2025
4e86da2
fix: update PostgreSQL migration mock to match actual execute() signa…
BukeLy Nov 19, 2025
31e3ad1
refactor: remove redundant test files
BukeLy Nov 20, 2025
8386ea0
refactor: unify PostgreSQL and Qdrant migration logic for consistency
BukeLy Nov 20, 2025
c89b0ee
fix: specify conflict target in PostgreSQL ON CONFLICT clause
BukeLy Nov 20, 2025
e1e1080
test: add E2E tests for dimension mismatch scenarios
BukeLy Nov 20, 2025
e0767b1
fix: correct Qdrant point ID type in dimension mismatch E2E test
BukeLy Nov 20, 2025
5180c1e
feat: implement dimension compatibility checks for PostgreSQL and Qdr…
BukeLy Nov 20, 2025
8077c8a
style: fix lint errors in test files
BukeLy Nov 20, 2025
e89c17c
fix: restore uv.lock revision 3 and fix code formatting
BukeLy Nov 20, 2025
44e8be1
style: apply ruff formatting fixes to test_e2e_multi_instance.py
BukeLy Nov 20, 2025
f69cf9b
fix: prevent vector dimension mismatch crashes and data loss on no-su…
BukeLy Nov 23, 2025
cfc6587
fix: prevent race conditions and cross-workspace data leakage in migr…
BukeLy Nov 23, 2025
49bbb3a
test: add E2E test for workspace migration isolation
BukeLy Nov 23, 2025
204a253
fix: prevent double-release in UnifiedLock.__aexit__ error recovery
BukeLy Nov 23, 2025
16fff35
fix: prevent data loss in PostgreSQL migration and add doc_status tab…
BukeLy Nov 23, 2025
e2d68ad
style: apply ruff formatting to test files
BukeLy Nov 23, 2025
510baeb
fix: correct PostgreSQL execute() parameter format in workspace cleanup
BukeLy Nov 23, 2025
3b8a1e6
style: apply ruff formatting fixes to test files
BukeLy Nov 23, 2025
a8f5c9b
fix: migrate workspace data in PostgreSQL Case 1 to prevent data loss
BukeLy Nov 25, 2025
0fb7c5b
test: add unit test for Case 1 sequential workspace migration bug
BukeLy Nov 25, 2025
cf68cdf
refactor: improve PostgreSQL migration code quality
BukeLy Nov 25, 2025
19ab979
Merge branch 'main' into feature/vector-model-isolation
danielaskdd Dec 12, 2025
1b62ec9
refactor(Qdrant): simplify suffix generation and improve migration logic
danielaskdd Dec 16, 2025
6a9e368
Rename QdrantMigrationError to DataMigrationError for generalization
danielaskdd Dec 16, 2025
bf618fc
Refactor Qdrant setup and migration logic
danielaskdd Dec 19, 2025
0ae60d3
Improve Qdrant migration checks and verification logic
danielaskdd Dec 19, 2025
ada5f10
Optimize Postgres batch operations and refine workspace migration logic
danielaskdd Dec 19, 2025
37e4d94
Add vector dimension validation and storage safety checks
danielaskdd Dec 19, 2025
343ccac
Add 'd' suffix to dimensions in migration error message
danielaskdd Dec 19, 2025
a3b33bb
Remove E2E tests and update migration unit tests
danielaskdd Dec 19, 2025
e9003f3
Move shared lock validation to factory functions and fix test formatting
danielaskdd Dec 19, 2025
1c083c6
Remove redundant pytest.mark.asyncio decorators
danielaskdd Dec 19, 2025
e77a506
Add workspace filtering to Qdrant legacy migration
danielaskdd Dec 19, 2025
4ac5ec4
Improve Qdrant workspace detection via payload sampling
danielaskdd Dec 19, 2025
27863a6
Suppress empty warning for legacy tables in PostgreSQL if legacy and …
danielaskdd Dec 19, 2025
73c3c41
Drop Python 3.13 from tests and reformat code
danielaskdd Dec 19, 2025
c1ed2e3
Handle diverse vector types in Postgres storage
danielaskdd Dec 19, 2025
93ea50c
Restrict Qdrant legacy scroll filter to specific workspace
danielaskdd Dec 19, 2025
dfe628a
Use keyset pagination for PostgreSQL migration
danielaskdd Dec 19, 2025
3456818
Wrap inner embedding func to preserve attributes
danielaskdd Dec 19, 2025
81a0d63
feat: add Qdrant legacy data prep tool for migration tests
danielaskdd Dec 19, 2025
864131a
Enforce embedding_func validation in BaseVectorStorage
danielaskdd Dec 19, 2025
2073f95
Add validation for PostgreSQL table name length
danielaskdd Dec 19, 2025
911585f
Refactor Qdrant deletion logic for safety and scalability
danielaskdd Dec 19, 2025
85e8e33
Fix string vector parsing in PG workspace migration
danielaskdd Dec 19, 2025
c81e9c9
Register pgvector codec in pool init for consistent vector handling
danielaskdd Dec 19, 2025
e12dfdb
Bootstrap vector extension before pool creation
danielaskdd Dec 19, 2025
1aa4a3a
Fix PostgreSQL index lookup failure for long table names
danielaskdd Dec 19, 2025
0ac35bf
Prevent mutation of shared EmbeddingFunc instances
danielaskdd Dec 20, 2025
e596512
Fix `__post_init__` usage in Mongo and Qdrant storage implementations
danielaskdd Dec 20, 2025
9726431
Improve vector storage logging and migration warnings
danielaskdd Dec 20, 2025
0987517
Refine migration warning messages for PG and Qdrant
danielaskdd Dec 20, 2025
c65d606
Correct comments regarding __post_init__ invocation sources
danielaskdd Dec 20, 2025
7618de4
Refine Qdrant legacy collection lookup with model suffix support
danielaskdd Dec 20, 2025
9381dee
Elevate manual deletion log to warning level
danielaskdd Dec 20, 2025
9c52e32
Fix legacy collection name in Qdrant warning log
danielaskdd Dec 20, 2025
caed4fb
Add model_name attribute to embedding wrappers
danielaskdd Dec 20, 2025
77ed23a
Fix markdown table formatting in README files
danielaskdd Dec 20, 2025
ff19a67
Add model_suffix argument to Qdrant tests
danielaskdd Dec 20, 2025
2228a75
Fix NumPy ambiguity and array support in Postgres
danielaskdd Dec 21, 2025
8ef86c4
Refactor PG vector storage and add index creation
danielaskdd Dec 21, 2025
5fef7e4
Skip legacy vector table init in Postgres and fix migration checks
danielaskdd Dec 21, 2025
be744a2
Update Postgres tests for keyset pagination and API changes
danielaskdd Dec 21, 2025
afe3f37
Update PG mismatch tests to expect errors
danielaskdd Dec 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:

strategy:
matrix:
python-version: ['3.12', '3.13', '3.14']
python-version: ['3.12', '3.14']

steps:
- uses: actions/checkout@v6
Expand Down
31 changes: 31 additions & 0 deletions lightrag/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,37 @@ class BaseVectorStorage(StorageNameSpace, ABC):
cosine_better_than_threshold: float = field(default=0.2)
meta_fields: set[str] = field(default_factory=set)

def __post_init__(self):
"""Validate required embedding_func for vector storage."""
if self.embedding_func is None:
raise ValueError(
"embedding_func is required for vector storage. "
"Please provide a valid EmbeddingFunc instance."
)

def _generate_collection_suffix(self) -> str | None:
"""Generates collection/table suffix from embedding_func.

Return suffix if model_name exists in embedding_func, otherwise return None.
Note: embedding_func is guaranteed to exist (validated in __post_init__).

Returns:
str | None: Suffix string e.g. "text_embedding_3_large_3072d", or None if model_name not available
"""
import re

# Check if model_name exists (model_name is optional in EmbeddingFunc)
model_name = getattr(self.embedding_func, "model_name", None)
if not model_name:
return None

# embedding_dim is required in EmbeddingFunc
embedding_dim = self.embedding_func.embedding_dim

# Generate suffix: clean model name and append dimension
safe_model_name = re.sub(r"[^a-zA-Z0-9_]", "_", model_name.lower())
return f"{safe_model_name}_{embedding_dim}d"

@abstractmethod
async def query(
self, query: str, top_k: int, query_embedding: list[float] = None
Expand Down
4 changes: 2 additions & 2 deletions lightrag/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,8 +128,8 @@ def __init__(
self.chunk_preview = truncated_preview


class QdrantMigrationError(Exception):
"""Raised when Qdrant data migration from legacy collections fails."""
class DataMigrationError(Exception):
"""Raised when data migration from legacy collection/table fails."""

def __init__(self, message: str):
super().__init__(message)
Expand Down
16 changes: 16 additions & 0 deletions lightrag/kg/faiss_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ class FaissVectorDBStorage(BaseVectorStorage):
"""

def __post_init__(self):
super().__post_init__()
# Grab config values if available
kwargs = self.global_config.get("vector_db_storage_cls_kwargs", {})
cosine_threshold = kwargs.get("cosine_better_than_threshold")
Expand Down Expand Up @@ -358,9 +359,22 @@ def _load_faiss_index(self):
)
return

dim_mismatch = False
try:
# Load the Faiss index
self._index = faiss.read_index(self._faiss_index_file)

# Verify dimension consistency between loaded index and embedding function
if self._index.d != self._dim:
error_msg = (
f"Dimension mismatch: loaded Faiss index has dimension {self._index.d}, "
f"but embedding function expects dimension {self._dim}. "
f"Please ensure the embedding model matches the stored index or rebuild the index."
)
logger.error(error_msg)
dim_mismatch = True
raise ValueError(error_msg)

# Load metadata
with open(self._meta_file, "r", encoding="utf-8") as f:
stored_dict = json.load(f)
Expand All @@ -375,6 +389,8 @@ def _load_faiss_index(self):
f"[{self.workspace}] Faiss index loaded with {self._index.ntotal} vectors from {self._faiss_index_file}"
)
except Exception as e:
if dim_mismatch:
raise
logger.error(
f"[{self.workspace}] Failed to load Faiss index or metadata: {e}"
)
Expand Down
1 change: 1 addition & 0 deletions lightrag/kg/milvus_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -934,6 +934,7 @@ def _create_collection_if_not_exist(self):
raise

def __post_init__(self):
super().__post_init__()
# Check for MILVUS_WORKSPACE environment variable first (higher priority)
# This allows administrators to force a specific workspace for all Milvus storage instances
milvus_workspace = os.environ.get("MILVUS_WORKSPACE")
Expand Down
26 changes: 25 additions & 1 deletion lightrag/kg/mongo_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -2131,8 +2131,32 @@ async def create_vector_index_if_not_exists(self):
indexes = await indexes_cursor.to_list(length=None)
for index in indexes:
if index["name"] == self._index_name:
# Check if the existing index has matching vector dimensions
existing_dim = None
definition = index.get("latestDefinition", {})
fields = definition.get("fields", [])
for field in fields:
if (
field.get("type") == "vector"
and field.get("path") == "vector"
):
existing_dim = field.get("numDimensions")
break

expected_dim = self.embedding_func.embedding_dim

if existing_dim is not None and existing_dim != expected_dim:
error_msg = (
f"Vector dimension mismatch! Index '{self._index_name}' has "
f"dimension {existing_dim}, but current embedding model expects "
f"dimension {expected_dim}. Please drop the existing index or "
f"use an embedding model with matching dimensions."
)
logger.error(f"[{self.workspace}] {error_msg}")
raise ValueError(error_msg)

logger.info(
f"[{self.workspace}] vector index {self._index_name} already exist"
f"[{self.workspace}] vector index {self._index_name} already exists with matching dimensions ({expected_dim})"
)
return

Expand Down
1 change: 1 addition & 0 deletions lightrag/kg/nano_vector_db_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
@dataclass
class NanoVectorDBStorage(BaseVectorStorage):
def __post_init__(self):
super().__post_init__()
# Initialize basic attributes
self._client = None
self._storage_lock = None
Expand Down
Loading
Loading