Feat: Add Optional Embedding Dimension Control with OpenAI API#2328
Feat: Add Optional Embedding Dimension Control with OpenAI API#2328danielaskdd merged 12 commits intomainfrom
Conversation
* Add EMBEDDING_SEND_DIM environment variable * Update Jina/OpenAI embed functions * Add send_dimensions to EmbeddingFunc * Auto-inject embedding_dim when enabled * Add parameter validation warnings
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
• Pass embedding_dim to jina_embed call • Pass embedding_dim to openai_embed call
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
• Fix similarity search error in query stage • Remove redundant null checks • Improve log readability
Add Optional Embedding Dimension Control with OpenAI API
🎯 Overview
This PR implements proper dimension parameter handling for OpenAI and Jina embedding APIs, with mandatory dimension sending for Jina (API requirement) and optional control for OpenAI.
📋 Changes Summary
1. Embedding Function Updates
OpenAI (
lightrag/llm/openai.py)embedding_dimparameter toopenai_embed()EMBEDDING_SEND_DIMconfigurationJina (
lightrag/llm/jina.py)dimensionsparameter toembedding_dimfor consistency2. Smart Dimension Injection (
lightrag/utils.py)Enhanced
EmbeddingFuncclass with intelligent dimension management:Features:
3. Provider-Specific Logic (
lightrag/api/lightrag_server.py)Implemented binding-aware dimension parameter control:
Logic Flow:
send_dimensionsforced toTrue(API requirement)send_dimensionscontrolled byEMBEDDING_SEND_DIMenv varsend_dimensionsset toFalse(not supported)4. Environment Configuration (
env.example)Updated documentation with clear guidance:
5. Prohibit direct access to internal functions of EmbeddingFunc in query stage
Update embedding in query stage to use embedding_func_config([query]) instead of embedding_func_config.func([query]), ensuring the EmbeddingFunc wrapper properly applies dimension configuration from EMBEDDING_SEND_DIM=true.
Fixed Locations:
_perform_kg_searchfunction__ (line ~1510): Pre-computing query embedding for all vector operations_find_related_text_unit_from_entitiesfunction__ (line ~1960): Entity-related chunk selection by vector similarity_find_related_text_unit_from_relationsfunction__ (line ~2180): Relation-related chunk selection by vector similarity🔧 Technical Details
Dimension Sending Decision Matrix
truefalsetruefalsetruefalseParameter Injection Flow
graph TD A[Check Binding Type] --> B{Jina?} B -->|Yes| C[Force send_dimensions=True] B -->|No| D[Check EMBEDDING_SEND_DIM] D --> E{Environment Var?} E -->|true| F[Check Function Signature] E -->|false| G[send_dimensions=False] F --> H{Has embedding_dim param?} H -->|Yes| I[send_dimensions=True] H -->|No| G C --> J[Auto-inject dimension on each call] I --> J G --> K[Use function without dimension]Validation Logic
When
send_dimensions=True:Check if user manually provided
embedding_dimparameterIf provided value differs from class attribute, log warning:
Always inject the decorator-declared
embedding_dimvalue🎁 Benefits
⚙️ Configuration Guide
For Jina (Automatic - No Configuration Needed)
For OpenAI (Optional Control)
For Other Backends (No Impact)
EMBEDDING_BINDING=ollama EMBEDDING_DIM=1024 # EMBEDDING_SEND_DIM automatically ignored (not supported)🧪 Backward Compatibility
✅ Fully Backward Compatible:
EMBEDDING_SEND_DIM=false)🔍 Testing Recommendations
Required Tests
EMBEDDING_SEND_DIM=false- Dimension still sent (forced)EMBEDDING_SEND_DIM=true- Dimension sent (forced)EMBEDDING_SEND_DIM=true- Dimension sentEMBEDDING_SEND_DIM=false- Dimension not sent (default)embedding_dimparameter conflictTest Scenarios
📊 Logging Examples
Jina Binding (Forced Dimension)
OpenAI with EMBEDDING_SEND_DIM=true
OpenAI with EMBEDDING_SEND_DIM=false (default)
🚀 Migration Path
For Existing Deployments
No action required - the changes are backward compatible:
For New Deployments
OpenAI users can optionally enable dimension reduction:
# Add to .env file EMBEDDING_SEND_DIM=trueJina users require no configuration - dimension parameter automatically sent.
🔗 Related Information
API Requirements
dimensionsparameter for proper operationdimensionsparameter for dimension reductionPerformance Impact
✅ Checklist