Skip to content

Feat: Add Workspace Isolation to Resolve Multi-Instance Concurrency Interference#2366

Closed
danielaskdd wants to merge 20 commits intomainfrom
dev-workspace-isolation
Closed

Feat: Add Workspace Isolation to Resolve Multi-Instance Concurrency Interference#2366
danielaskdd wants to merge 20 commits intomainfrom
dev-workspace-isolation

Conversation

@danielaskdd
Copy link
Collaborator

@danielaskdd danielaskdd commented Nov 16, 2025

Feat: Add Workspace Isolation to Resolve Multi-Instance Concurrency Interference

🎯 Problem Statement

When multiple LightRAG objects with different workspace values are instantiated simultaneously, the following issues occur:

  1. Pipeline Status Sharing Conflicts: All workspaces share a single pipeline_status, causing pipeline states from different workspaces to interfere with each other
  2. Lock Mechanism Deficiency: Existing locks (_pipeline_status_lock, _graph_db_lock, _storage_lock) are not workspace-isolated, causing operations from different workspaces to block each other unnecessarily
  3. In Memory Json KV Storage Lack of Workspace Isolation: Related namespace functions don't provide workspace parameters, preventing true workspace isolation

✨ Solution

1. Workspace Isolation for Pipeline Status

  • Treat pipeline_status as a special namespace (storage type), similar to KV storage but without persistence
  • Create independent pipeline_status namespace for each workspace
  • Namespace format: <workspace>:pipeline_status

2. Unified Workspace-Based Lock Mechanism

  • Remove legacy global locks: _pipeline_status_lock, _graph_db_lock, _storage_lock
  • Introduce unified keyed lock mechanism: implemented via _storage_keyed_lock
  • Lock namespace: <workspace>:<storage_type>
  • Lock key: Fixed as default_key
  • Benefits: Fine-grained workspace-level isolation, avoiding cross-workspace lock contention

3. New get_namespace_lock() Function

def get_namespace_lock(
    namespace: str, 
    workspace: str | None = None, 
    enable_logging: bool = False
) -> NamespaceLock
  • Simplifies namespace-level lock acquisition
  • Automatically handles workspace and namespace combination
  • Unified lock interface, replacing multiple independent locks

4. Add Workspace Parameter to All Namespace Operations

Updated function signatures to support workspace parameter:

  • initialize_pipeline_status(workspace: str | None = None)
  • get_namespace_data(namespace: str, first_init: bool = False, workspace: str | None = None)
  • get_update_flag(namespace: str, workspace: str | None = None)
  • set_all_update_flags(namespace: str, workspace: str | None = None)
  • clear_all_update_flags(namespace: str, workspace: str | None = None)
  • get_all_update_flags_status(workspace: str | None = None)
  • try_initialize_namespace(namespace: str, workspace: str | None = None)

5. Default Workspace Support (Backward Compatibility)

  • Added global variable _default_workspace
  • Added function set_default_workspace(workspace: str | None = None)
  • Added function get_default_workspace() -> str
  • Purpose: Maintain compatibility with legacy code that doesn't provide workspace parameter
  • Behavior: Automatically use default workspace when workspace parameter is None

6. Unified Namespace Naming Convention

Added get_final_namespace() function:

def get_final_namespace(namespace: str, workspace: str | None = None) -> str
  • Centralized logic for combining workspace and namespace
  • Format: <workspace>:<namespace> or <namespace> (when workspace is empty)
  • Ensures consistent naming across all namespace operations

7. Standardize empty workspace handling from "_" to "" across storage

  • Unify empty workspace behavior by changing workspace from "_" to ""
  • Fixed incorrect empty workspace detection in get_all_update_flags_status()

8. Auto-initialize pipeline status in initialize_storages()

  • Remove manual initialize_pipeline_status calls
  • Auto-init in initialize_storages method
  • Update error and warning messages and for clarity
  • Remove manual initialize_pipeline_status() calls across codebase
  • Update docs and examples

📝 Key Modified Files

  • lightrag/kg/shared_storage.py: Core modification file

    • Added workspace isolation logic
    • Implemented get_namespace_lock()
    • Implemented get_final_namespace()
    • Added default workspace support
    • Added workspace parameter to all namespace operation functions
  • Storage Implementation Files (using new lock mechanism):

    • lightrag/kg/json_kv_impl.py
    • lightrag/kg/json_doc_status_impl.py
    • lightrag/kg/nano_vector_db_impl.py
    • lightrag/kg/faiss_impl.py
    • lightrag/kg/networkx_impl.py
    • All storage implementations now use get_namespace_lock() instead of legacy locks
  • API and Core Logic Files:

    • lightrag/lightrag.py: Set default workspace
    • lightrag/api/lightrag_server.py: Pipeline status initialization
    • lightrag/api/routers/document_routes.py: Use new namespace lock interface

🧪 Testing Recommendations

  1. Multi-Workspace Concurrency Test: Create multiple LightRAG instances with different workspaces simultaneously, verify no interference
  2. Pipeline Status Isolation Test: Verify pipeline status for different workspaces runs independently
  3. Backward Compatibility Test: Verify legacy code without workspace specification still works correctly
  4. Lock Mechanism Test: Verify new keyed lock mechanism works correctly without deadlocks

🎉 Expected Outcomes

  • ✅ Complete workspace-level isolation
  • ✅ LightRAG instances with different workspaces can run concurrently without interference
  • ✅ Pipeline status no longer interferes across workspaces
  • ✅ Optimized lock granularity, reduced unnecessary lock contention
  • ✅ 100% backward compatible with existing code

chengjie and others added 4 commits November 13, 2025 22:31
Problem:
In multi-tenant scenarios, different workspaces share a single global
pipeline_status namespace, causing pipelines from different tenants to
block each other, severely impacting concurrent processing performance.

Solution:
- Extended get_namespace_data() to recognize workspace-specific pipeline
  namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern)
- Added workspace parameter to initialize_pipeline_status() for per-tenant
  isolated pipeline namespaces
- Updated all 7 call sites to use workspace-aware locks:
  * lightrag.py: process_document_queue(), aremove_document()
  * document_routes.py: background_delete_documents(), clear_documents(),
    cancel_pipeline(), get_pipeline_status(), delete_documents()

Impact:
- Different workspaces can process documents concurrently without blocking
- Backward compatible: empty workspace defaults to "pipeline_status"
- Maintains fail-fast: uninitialized pipeline raises clear error
- Expected N× performance improvement for N concurrent tenants

Bug fixes:
- Fixed AttributeError by using self.workspace instead of self.global_config
- Fixed pipeline status endpoint to show workspace-specific status
- Fixed delete endpoint to check workspace-specific busy flag

Code changes: 4 files, 141 insertions(+), 28 deletions(-)

Testing: All syntax checks passed, comprehensive workspace isolation tests completed
Fixes two compatibility issues in workspace isolation:

1. Problem: lightrag_server.py calls initialize_pipeline_status()
   without workspace parameter, causing pipeline to initialize in
   global namespace instead of rag's workspace.

   Solution: Add set_default_workspace() mechanism in shared_storage.
   LightRAG.initialize_storages() now sets default workspace, which
   initialize_pipeline_status() uses when called without parameters.

2. Problem: /health endpoint hardcoded to use "pipeline_status",
   cannot return workspace-specific status or support frontend
   workspace selection.

   Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now
   extracts workspace from header or falls back to server default,
   returning correct workspace-specific pipeline status.

Changes:
- lightrag/kg/shared_storage.py: Add set/get_default_workspace()
- lightrag/lightrag.py: Call set_default_workspace() in initialize_storages()
- lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper,
  update /health endpoint to support LIGHTRAG-WORKSPACE header

Testing:
- Backward compatibility: Old code works without modification
- Multi-instance safety: Explicit workspace passing preserved
- /health endpoint: Supports both default and header-specified workspaces

Related: #2353
- Remove DB-specific workspace configs
- Add default workspace auto-setting
- Replace global locks with namespace locks
- Simplify pipeline status management
- Remove redundant graph DB locking
@danielaskdd
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Fix final_namespace error in get_namespace_data()
- Fix get_workspace_from_request return type
- Add workspace param to pipeline status calls
• Add NamespaceLock class wrapper
• Fix lock re-entrance issues
• Enable concurrent lock usage
• Fresh context per async with block
• Update get_namespace_lock API
@danielaskdd
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Use ContextVar for per-coroutine storage
- Prevent state interference between coroutines
- Add re-entrance protection check
* Unify empty workspace behavior by changing workspace from "_" to ""
* Fixed incorrect empty workspace detection in get_all_update_flags_status()
- Add check for bare "pipeline_status"
- Handle namespace without prefix
@danielaskdd
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

• Handle namespaces with/without prefixes
• Fix workspace matching logic
@danielaskdd
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

* Acquire lock before setting ContextVar
* Prevent state corruption on cancellation
* Fix permanent lock brick scenario
* Store context only after success
* Handle acquisition failure properly
• Remove manual initialize_pipeline_status calls
• Auto-init in initialize_storages method
• Update error messages for clarity
• Warn on workspace conflicts
- Auto-init pipeline status in storages
- Remove redundant import statements
- Simplify initialization pattern
- Update docs and examples
@danielaskdd
Copy link
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@BukeLy
Copy link
Contributor

BukeLy commented Nov 17, 2025

@danielaskdd Hi! I need to fix the commit author attribution before this PR merges.

Issue:
Commit 5f15358 uses email chengjie@gmail.com which is NOT linked to my GitHub account @BukeLy. This means I won't get proper credit for my contribution.

My correct GitHub email:
bukely0119@foxmail.com

Request:
Could you please update the commit author to use my verified email? This is a quick fix:

git checkout dev-workspace-isolation
git rebase -i 5f153582^
# Change 'pick' to 'edit' for commit 5f153582
git commit --amend --author="BukeLy <bukely0119@foxmail.com>" --no-edit
git rebase --continue
git push --force

Why this matters:
Without this fix, GitHub won't attribute the commit to my account and I won't appear in Contributors despite doing this work in PR #2353.

I've already updated my own PR branch with the correct email. Thank you for your help! 🙏

Alternative (if rebasing is too risky):
Add Co-authored-by line:
Co-authored-by: BukeLy <bukely0119@foxmail.com>

Why this change is needed:
PR #2366 introduces critical workspace isolation functionality to resolve
multi-instance concurrency issues, but lacks comprehensive automated tests
to validate the implementation. Without proper test coverage, we cannot
ensure the feature works correctly across all scenarios mentioned in the PR.

What this test suite covers:
1. Pipeline Status Isolation: Verifies different workspaces maintain
   independent pipeline status without interference
2. Lock Mechanism: Validates the new keyed lock system works correctly
   - Different workspaces can acquire locks in parallel
   - Same workspace locks serialize properly
   - No deadlocks occur
3. Backward Compatibility: Ensures legacy code without workspace parameters
   continues to work using default workspace
4. Multi-Workspace Concurrency: Confirms multiple LightRAG instances with
   different workspaces can run concurrently without data interference

Testing approach:
- All tests are automated and deterministic
- Uses timing assertions to verify parallel vs serial lock behavior
- Validates data isolation through direct namespace data inspection
- Comprehensive error handling and detailed test output

Test results:
All 9 test cases passed successfully, confirming the workspace isolation
feature is working correctly across all key scenarios.

Impact:
Provides confidence that PR #2366's workspace isolation feature is
production-ready and won't introduce regressions.
BukeLy and others added 3 commits November 17, 2025 11:46
Why this enhancement is needed:
The initial test suite covered the 4 core scenarios from PR #2366, but
lacked comprehensive coverage of edge cases and implementation details.
This update adds 5 additional test scenarios to achieve complete validation
of the workspace isolation feature.

What was added:
Test 5 - NamespaceLock Re-entrance Protection (2 sub-tests):
  - Verifies re-entrance in same coroutine raises RuntimeError
  - Confirms same NamespaceLock instance works in concurrent coroutines

Test 6 - Different Namespace Lock Isolation:
  - Validates locks with same workspace but different namespaces are independent

Test 7 - Error Handling (2 sub-tests):
  - Tests None workspace conversion to empty string
  - Validates empty workspace creates correct namespace format

Test 8 - Update Flags Workspace Isolation (3 sub-tests):
  - set_all_update_flags isolation between workspaces
  - clear_all_update_flags isolation between workspaces
  - get_all_update_flags_status workspace filtering

Test 9 - Empty Workspace Standardization (2 sub-tests):
  - Empty workspace namespace format verification
  - Empty vs non-empty workspace independence

Test Results:
All 19 test cases passed (previously 9/9, now 19/19)
- 4 core PR requirements: 100% coverage
- 5 additional scenarios: 100% coverage
- Total coverage: 100% of workspace isolation implementation

Testing approach improvements:
- Proper initialization of update flags using get_update_flag()
- Correct handling of flag objects (.value property)
- Updated error handling tests to match actual implementation behavior
- All edge cases and boundary conditions validated

Impact:
Provides complete confidence in the workspace isolation feature with
comprehensive test coverage of all implementation details, edge cases,
and error handling paths.
Implemented two critical test scenarios:

Test 10 - JsonKVStorage Integration Test:
- Instantiate two JsonKVStorage instances with different workspaces
- Write different data to each instance (entity1, entity2)
- Read back and verify complete data isolation
- Verify workspace directories are created correctly
- Result: Data correctly isolated, no mixing between workspaces

Test 11 - LightRAG End-to-End Test:
- Instantiate two LightRAG instances with different workspaces
- Insert different documents to each instance
- Verify workspace directory structure (project_a/, project_b/)
- Verify file separation and data isolation
- Result: All 8 storage files created separately per workspace
- Document data correctly isolated between workspaces

Test Results: 23/23 passed
- 19 unit tests
- 2 integration tests (JsonKVStorage data + file structure)
- 2 E2E tests (LightRAG file structure + data isolation)

Coverage: 100% - Unit, Integration, and E2E validated
test: Add comprehensive workspace isolation test suite for PR #2366
danielaskdd pushed a commit that referenced this pull request Nov 17, 2025
Why this change is needed:
PR #2366 introduces critical workspace isolation functionality to resolve
multi-instance concurrency issues, but lacks comprehensive automated tests
to validate the implementation. Without proper test coverage, we cannot
ensure the feature works correctly across all scenarios mentioned in the PR.

What this test suite covers:
1. Pipeline Status Isolation: Verifies different workspaces maintain
   independent pipeline status without interference
2. Lock Mechanism: Validates the new keyed lock system works correctly
   - Different workspaces can acquire locks in parallel
   - Same workspace locks serialize properly
   - No deadlocks occur
3. Backward Compatibility: Ensures legacy code without workspace parameters
   continues to work using default workspace
4. Multi-Workspace Concurrency: Confirms multiple LightRAG instances with
   different workspaces can run concurrently without data interference

Testing approach:
- All tests are automated and deterministic
- Uses timing assertions to verify parallel vs serial lock behavior
- Validates data isolation through direct namespace data inspection
- Comprehensive error handling and detailed test output

Test results:
All 9 test cases passed successfully, confirming the workspace isolation
feature is working correctly across all key scenarios.

Impact:
Provides confidence that PR #2366's workspace isolation feature is
production-ready and won't introduce regressions.
danielaskdd pushed a commit that referenced this pull request Nov 17, 2025
Why this enhancement is needed:
The initial test suite covered the 4 core scenarios from PR #2366, but
lacked comprehensive coverage of edge cases and implementation details.
This update adds 5 additional test scenarios to achieve complete validation
of the workspace isolation feature.

What was added:
Test 5 - NamespaceLock Re-entrance Protection (2 sub-tests):
  - Verifies re-entrance in same coroutine raises RuntimeError
  - Confirms same NamespaceLock instance works in concurrent coroutines

Test 6 - Different Namespace Lock Isolation:
  - Validates locks with same workspace but different namespaces are independent

Test 7 - Error Handling (2 sub-tests):
  - Tests None workspace conversion to empty string
  - Validates empty workspace creates correct namespace format

Test 8 - Update Flags Workspace Isolation (3 sub-tests):
  - set_all_update_flags isolation between workspaces
  - clear_all_update_flags isolation between workspaces
  - get_all_update_flags_status workspace filtering

Test 9 - Empty Workspace Standardization (2 sub-tests):
  - Empty workspace namespace format verification
  - Empty vs non-empty workspace independence

Test Results:
All 19 test cases passed (previously 9/9, now 19/19)
- 4 core PR requirements: 100% coverage
- 5 additional scenarios: 100% coverage
- Total coverage: 100% of workspace isolation implementation

Testing approach improvements:
- Proper initialization of update flags using get_update_flag()
- Correct handling of flag objects (.value property)
- Updated error handling tests to match actual implementation behavior
- All edge cases and boundary conditions validated

Impact:
Provides complete confidence in the workspace isolation feature with
comprehensive test coverage of all implementation details, edge cases,
and error handling paths.
@danielaskdd danielaskdd deleted the dev-workspace-isolation branch November 17, 2025 05:14
raphaelmansuy pushed a commit to raphaelmansuy/LightRAG that referenced this pull request Dec 5, 2025
Why this enhancement is needed:
The initial test suite covered the 4 core scenarios from PR HKUDS#2366, but
lacked comprehensive coverage of edge cases and implementation details.
This update adds 5 additional test scenarios to achieve complete validation
of the workspace isolation feature.

What was added:
Test 5 - NamespaceLock Re-entrance Protection (2 sub-tests):
  - Verifies re-entrance in same coroutine raises RuntimeError
  - Confirms same NamespaceLock instance works in concurrent coroutines

Test 6 - Different Namespace Lock Isolation:
  - Validates locks with same workspace but different namespaces are independent

Test 7 - Error Handling (2 sub-tests):
  - Tests None workspace conversion to empty string
  - Validates empty workspace creates correct namespace format

Test 8 - Update Flags Workspace Isolation (3 sub-tests):
  - set_all_update_flags isolation between workspaces
  - clear_all_update_flags isolation between workspaces
  - get_all_update_flags_status workspace filtering

Test 9 - Empty Workspace Standardization (2 sub-tests):
  - Empty workspace namespace format verification
  - Empty vs non-empty workspace independence

Test Results:
All 19 test cases passed (previously 9/9, now 19/19)
- 4 core PR requirements: 100% coverage
- 5 additional scenarios: 100% coverage
- Total coverage: 100% of workspace isolation implementation

Testing approach improvements:
- Proper initialization of update flags using get_update_flag()
- Correct handling of flag objects (.value property)
- Updated error handling tests to match actual implementation behavior
- All edge cases and boundary conditions validated

Impact:
Provides complete confidence in the workspace isolation feature with
comprehensive test coverage of all implementation details, edge cases,
and error handling paths.

(cherry picked from commit 436e414)
raphaelmansuy pushed a commit to raphaelmansuy/LightRAG that referenced this pull request Dec 5, 2025


Why this change is needed:
PR HKUDS#2366 introduces critical workspace isolation functionality to resolve
multi-instance concurrency issues, but lacks comprehensive automated tests
to validate the implementation. Without proper test coverage, we cannot
ensure the feature works correctly across all scenarios mentioned in the PR.

What this test suite covers:
1. Pipeline Status Isolation: Verifies different workspaces maintain
   independent pipeline status without interference
2. Lock Mechanism: Validates the new keyed lock system works correctly
   - Different workspaces can acquire locks in parallel
   - Same workspace locks serialize properly
   - No deadlocks occur
3. Backward Compatibility: Ensures legacy code without workspace parameters
   continues to work using default workspace
4. Multi-Workspace Concurrency: Confirms multiple LightRAG instances with
   different workspaces can run concurrently without data interference

Testing approach:
- All tests are automated and deterministic
- Uses timing assertions to verify parallel vs serial lock behavior
- Validates data isolation through direct namespace data inspection
- Comprehensive error handling and detailed test output

Test results:
All 9 test cases passed successfully, confirming the workspace isolation
feature is working correctly across all key scenarios.

Impact:
Provides confidence that PR HKUDS#2366's workspace isolation feature is
production-ready and won't introduce regressions.

(cherry picked from commit 4742fc8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants