Skip to content

Add ability to add extra fields to AzureSearch VectorStore when adding documents #6134

@CameronVetter

Description

@CameronVetter

Feature request

Currently the AzureSearch VectorStore allows the user to specify a filter that can be used to filter (in the traditional search engine sense) a search index become doing a vector similarity search. This reduces the search space to improve speed as well as to help focus the vector search on the correct subset of documents.

This filtering feature is very hard to effectively use because the current method for adding documents (add_texts) only allows an id, content, content_vector, and metadata fields. None of these fields are suitable for filtering, so this requires the user to go back and add fields manually to the search index.

I propose that we allow the end user to specify extra fields that are added when creating these vectors. The end user would do something like this:

extra_fields = {"extra_fields": {"important_field_1": 123, "important_field_2": 456}}

documents.append(doc1)
documents.append(doc2)
documents.append(doc3)

vector_store.add_documents(documents, **extra_fields)

Then when the user queries this vector store late they can do something like this:

retriever.search_kwargs = {'filters': "important_field_1 eq 123"}

qa = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type="stuff",
            retriever=retriever,
        )

Motivation

My motivation was need for a project I'm working on, but I felt this was a needed general feature, as I stated in the feature request:

This filtering feature is very hard to effectively use because the current method for adding documents (add_texts) only allows an id, content, content_vector, and metadata fields. None of these fields are suitable for filtering, so this requires the user to go back and add fields manually to the search index.

Your contribution

Hopefully this makes sense, let me know if any clarifications are needed, once the bug #6131 is fixed I will submit a PR that implements this, I have it working locally and just need to write appropriate unit tests. Unit tests will not be possible until this bug is fixed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions