Feat: Add Gemini Embedding Support to LightRAG#2329
Conversation
- Implement gemini_embed function - Add gemini to embedding binding choices - Add L2 normalization for dims < 3072
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
LightRAG/lightrag/api/lightrag_server.py
Lines 751 to 768 in de4ed73
The new gemini_embed implementation only applies output_dimensionality and L2 normalisation when the embedding_dim argument is non‑None (gemini.py lines 501‑539), but the server never forwards a value for that parameter unless the global EMBEDDING_SEND_DIM flag is manually enabled. In create_app the send_dimensions flag remains False for all non‑Jina bindings by default (lightrag_server.py lines 751‑768), so the optimized embedding function is invoked with embedding_dim=None, which means the Gemini API is always called with its default dimension and the normalisation branch never runs. As a result, neither the advertised default of 1536 dimensions nor any user‑supplied EMBEDDING_DIM or --gemini-embedding-task-type settings have any effect out of the box, yielding unnormalised embeddings of whatever size the API chooses. This makes the dynamic dimension and normalisation feature effectively non‑functional for Gemini unless users know to set an unrelated EMBEDDING_SEND_DIM override.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Treat Gemini the same as Jina, requiring embedding dimension parameter. |
|
@codex review |
|
Codex Review: Didn't find any major issues. Keep them coming! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Feat: 🎯 Add Gemini Embedding Support to LightRAG
Summary
This PR adds comprehensive support for Google Gemini embeddings as a new embedding binding option in LightRAG, enabling users to leverage Gemini's
gemini-embedding-001model with advanced features including dynamic dimension control and automatic normalization.🚀 Features
1. Gemini Embedding Function (
lightrag/llm/gemini.py)EMBEDDING_DIMenvironment variable)2. Configuration Options (
lightrag/llm/binding_options.py)GeminiEmbeddingOptionsclass withtask_typeparameter3. Server Integration (
lightrag/api/lightrag_server.py)LLMConfigCachesupport for Gemini embedding options4. CLI/Environment Variable Support (
lightrag/api/config.py)--embedding-bindingchoices--gemini-embedding-task-typeGEMINI_EMBEDDING_TASK_TYPE📝 Usage Examples
Environment Variables
Command Line
🔧 Technical Details
Normalization Strategy
Gemini API behavior:
Implementation automatically handles this:
Task Type Options
🧪 Testing Checklist
📄 Files Changed
lightrag/llm/gemini.py- Addedgemini_embed()function (+129 lines)lightrag/llm/binding_options.py- AddedGeminiEmbeddingOptionsclass (+13 lines)lightrag/api/lightrag_server.py- Added Gemini embedding integration (+37 lines)lightrag/api/config.py- Exposed Gemini embedding CLI/ENV options (+22 lines)🔄 Migration Guide
For existing users, no breaking changes. New feature is opt-in:
✅ Checklist