Skip to content

feat: Add Automatic Text Truncation Support for Embedding Functions#2523

Merged
danielaskdd merged 5 commits intoHKUDS:mainfrom
danielaskdd:embedding-max-token
Dec 22, 2025
Merged

feat: Add Automatic Text Truncation Support for Embedding Functions#2523
danielaskdd merged 5 commits intoHKUDS:mainfrom
danielaskdd:embedding-max-token

Conversation

@danielaskdd
Copy link
Collaborator

Add Automatic Text Truncation Support for Embedding Functions

Summary

This PR enhances the EmbeddingFunc wrapper to automatically inject max_token_size parameter to underlying embedding functions that support it. This enables automatic text truncation for embedding operations, preventing API errors caused by texts exceeding model token limits.

Changes

Core Enhancement: lightrag/utils.py

  • Added import inspect for function signature introspection
  • Added automatic max_token_size injection logic in EmbeddingFunc.__call__:
    • Uses inspect.signature() to check if the underlying function supports max_token_size
    • Only injects when the parameter is supported (avoids TypeError for unsupported functions)

OpenAI Embedding: lightrag/llm/openai.py

  • Added import tiktoken for tokenization
  • Added _TIKTOKEN_ENCODING_CACHE module-level cache and _get_tiktoken_encoding_for_model() helper
  • Added max_token_size parameter to openai_embed() function
  • Implemented client-side text truncation using tiktoken (OpenAI API may return errors for over-limit texts)

Gemini Embedding: lightrag/llm/gemini.py

  • Added max_token_size parameter to gemini_embed() function
  • No client-side truncation - Gemini API handles truncation automatically (autoTruncate=True by default)

Ollama Embedding: lightrag/llm/ollama.py

  • Added max_token_size parameter to ollama_embed() function
  • Added comprehensive docstring
  • No client-side truncation - Ollama API handles truncation automatically based on num_ctx setting

Minor Updates

  • lightrag/api/lightrag_server.py: Updated log message for clarity
  • lightrag/operate.py: Changed token threshold from 90% to 100% and improved warning message

Truncation Strategies by Provider

Provider Truncation Strategy Reason
OpenAI Client-side (tiktoken) API may return errors for over-limit texts
Gemini Server-side (autoTruncate) API automatically truncates
Ollama Server-side (num_ctx) API automatically truncates

How It Works

User calls: await embedding_func(texts)
    ↓
EmbeddingFunc.__call__:
    1. inspect.signature() checks if func supports max_token_size
    2. If supported → inject max_token_size from decorator
    ↓
embedding_func(texts, max_token_size=...):
    - openai_embed: Client-side truncation with tiktoken
    - gemini_embed / ollama_embed: Server-side automatic truncation

Backward Compatibility

  • ✅ Fully backward compatible
  • Embedding functions without max_token_size parameter continue to work (signature check prevents injection)
  • No breaking changes to existing API

- Auto-inject max_token_size in wrapper
- Implement OpenAI client-side truncation
- Update Gemini/Ollama embed signatures
- Relax summary token warning threshold
- Update server startup logging
- Set cache env var before import
- Support raw encoding names
- Add cl100k_base to default list
- Improve cache path resolution
* Fix typo in log message
* Add missing closing parenthesis
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@danielaskdd danielaskdd merged commit dca23e2 into HKUDS:main Dec 22, 2025
3 checks passed
@danielaskdd danielaskdd deleted the embedding-max-token branch December 22, 2025 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant