Skip to content

Conversation

@jupyterjazz
Copy link
Contributor

@jupyterjazz jupyterjazz commented Jun 12, 2023

DocArray as a Retriever

DocArray is an open-source tool for managing your multi-modal data. It offers flexibility to store and search through your data using various document index backends. This PR introduces DocArrayRetriever - which works with any available backend and serves as a retriever for Langchain apps.

Also, I added 2 notebooks:
DocArray Backends - intro to all 5 currently supported backends, how to initialize, index, and use them as a retriever
DocArray Usage - showcasing what additional search parameters you can pass to create versatile retrievers

Example:

from docarray.index import InMemoryExactNNIndex
from docarray import BaseDoc, DocList
from docarray.typing import NdArray
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.retrievers import DocArrayRetriever


# define document schema
class MyDoc(BaseDoc):
    description: str
    description_embedding: NdArray[1536]


embeddings = OpenAIEmbeddings()
# create documents
descriptions = ["description 1", "description 2"]
desc_embeddings = embeddings.embed_documents(texts=descriptions)
docs = DocList[MyDoc](
    [
        MyDoc(description=desc, description_embedding=embedding)
        for desc, embedding in zip(descriptions, desc_embeddings)
    ]
)

# initialize document index with data
db = InMemoryExactNNIndex[MyDoc](docs)

# create a retriever
retriever = DocArrayRetriever(
    index=db,
    embeddings=embeddings,
    search_field="description_embedding",
    content_field="description",
)

# find the relevant document
doc = retriever.get_relevant_documents("action movies")
print(doc)

Who can review?

@dev2049

Signed-off-by: jupyterjazz <[email protected]>
Signed-off-by: jupyterjazz <[email protected]>
@jpzhangvincent
Copy link
Contributor

It would be nice to also add jina's annlite for the vector store option as well.

@jupyterjazz
Copy link
Contributor Author

hey @jpzhangvincent, annlite is not yet compatible with the new docarray version, but we might do it in the future, thanks for the suggestion!

Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need two separate notebooks about docarrary in the retrievers section

@hwchase17 hwchase17 added the lgtm label Jun 16, 2023
Signed-off-by: jupyterjazz <[email protected]>
@vercel
Copy link

vercel bot commented Jun 16, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 16, 2023 7:45pm

@vercel vercel bot temporarily deployed to Preview June 16, 2023 08:00 Inactive
@jupyterjazz jupyterjazz requested a review from hwchase17 June 16, 2023 08:16
@jupyterjazz
Copy link
Contributor Author

@hwchase17 @vowelparrot @dev2049

I'm not sure why Vercel is failing, I think it fails for all other recent PRs.

@vercel vercel bot temporarily deployed to Preview June 16, 2023 18:52 Inactive
@vercel vercel bot temporarily deployed to Preview June 16, 2023 19:29 Inactive
@vercel vercel bot temporarily deployed to Preview June 16, 2023 19:45 Inactive
Signed-off-by: jupyterjazz <[email protected]>
@vercel
Copy link

vercel bot commented Jun 16, 2023

@jupyterjazz is attempting to deploy a commit to the LangChain Team on Vercel.

A member of the Team first needs to authorize it.

@jupyterjazz
Copy link
Contributor Author

hey @hwchase17 @vowelparrot @dev2049

I think Vercel needs some approval from your side and CI should be green afterwards. The comment about separate notebooks is addressed!

@hwchase17 hwchase17 merged commit 427551e into langchain-ai:master Jun 17, 2023
This was referenced Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants