Skip to content

Add dimensions parameter support to openai_embed()#2323

Merged
1 commit merged intoHKUDS:mainfrom
yrangana:feat/add-dimensions-parameter
Nov 7, 2025
Merged

Add dimensions parameter support to openai_embed()#2323
1 commit merged intoHKUDS:mainfrom
yrangana:feat/add-dimensions-parameter

Conversation

@yrangana
Copy link
Contributor

@yrangana yrangana commented Nov 6, 2025

Description

Add support for the dimensions parameter in the openai_embed() function to enable API-level dimension reduction for embedding vectors. This fixes an issue where the embedding_dim parameter was accepted but never used in the API call.

Related Issues

This PR addresses the issue where embedding dimension reduction was not properly supported, leading to inefficient embedding generation and dimension mismatch errors in multi-tenant scenarios with shared Qdrant collections.

Changes Made

  • Added proper handling of the embedding_dim parameter in the openai_embed() function
  • Updated the function docstring to clarify that embedding_dim is used for dimension reduction
  • Implemented conditional logic to include the dimensions parameter in the API call only when embedding_dim is provided
  • Maintained backward compatibility for code that doesn't specify embedding_dim

Checklist

  • Changes tested locally
  • Code reviewed
  • Documentation updated (function docstring)
  • Unit tests added (not applicable for this simple change)

Additional Notes

This change enables several benefits:

  • Smaller vectors = faster retrieval + lower storage costs
  • Enables multi-tenant shared Qdrant collections with different dimension requirements
  • Maintains 95-99% quality at reduced dimensions through Matryoshka Representation Learning
  • Compatible with LiteLLM and OpenAI-compatible APIs

Modern embedding models like text-embedding-3-small and text-embedding-3-large support dimension reduction via the API's dimensions parameter, which this PR now properly utilizes.

@danielaskdd
Copy link
Collaborator

This PR has two issues:

  1. Many OpenAI-compatible embedding services do not support the dimensions parameter. Forcing this parameter to be passed may result in runtime errors.
  2. The EmbeddingFunc class already stores the dimension information in the embedding_dim attribute during initialization, which is the designated source for dimension information during vector store initialization. Therefore, embedding_dim should be the single source of truth for dimension information, and dimension value should not be redundantly passed when invoking the embedding function.

@danielaskdd
Copy link
Collaborator

PR #2328 addresses the two issues mentioned above. Please pull the branch and verify if the implementation works as expected. Thank you.

@danielaskdd danielaskdd closed this pull request by merging all changes into HKUDS:main in f4492d4 Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants