-
Notifications
You must be signed in to change notification settings - Fork 20.3k
Description
Feature request
Currently the AzureSearch VectorStore allows the user to specify a filter that can be used to filter (in the traditional search engine sense) a search index become doing a vector similarity search. This reduces the search space to improve speed as well as to help focus the vector search on the correct subset of documents.
This filtering feature is very hard to effectively use because the current method for adding documents (add_texts) only allows an id, content, content_vector, and metadata fields. None of these fields are suitable for filtering, so this requires the user to go back and add fields manually to the search index.
I propose that we allow the end user to specify extra fields that are added when creating these vectors. The end user would do something like this:
extra_fields = {"extra_fields": {"important_field_1": 123, "important_field_2": 456}}
documents.append(doc1)
documents.append(doc2)
documents.append(doc3)
vector_store.add_documents(documents, **extra_fields)
Then when the user queries this vector store late they can do something like this:
retriever.search_kwargs = {'filters': "important_field_1 eq 123"}
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
)
Motivation
My motivation was need for a project I'm working on, but I felt this was a needed general feature, as I stated in the feature request:
This filtering feature is very hard to effectively use because the current method for adding documents (add_texts) only allows an id, content, content_vector, and metadata fields. None of these fields are suitable for filtering, so this requires the user to go back and add fields manually to the search index.
Your contribution
Hopefully this makes sense, let me know if any clarifications are needed, once the bug #6131 is fixed I will submit a PR that implements this, I have it working locally and just need to write appropriate unit tests. Unit tests will not be possible until this bug is fixed.