Skip to content

Conversation

@JaysonAlbert
Copy link
Contributor

@JaysonAlbert JaysonAlbert commented Jun 9, 2023

the current implement put the doc itself as the metadata, but the document chatgpt plugin retriever returned already has a metadata field, it's better to use that instead.

the original code will throw the following exception when using RetrievalQAWithSourcesChain, becuse it can not find the field metadata:

Exception has occurred: ValueError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Document prompt requires documents to have metadata variables: ['source']. Received document with missing metadata: ['source'].
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 27, in format_document
    raise ValueError(
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 65, in <listcomp>
    doc_strings = [format_document(doc, self.document_prompt) for doc in docs]
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 65, in _get_inputs
    doc_strings = [format_document(doc, self.document_prompt) for doc in docs]
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 85, in combine_docs
    inputs = self._get_inputs(docs, **kwargs)
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 84, in _call
    output, extra_return_dict = self.combine_docs(
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/base.py", line 140, in __call__
    raise e

Additionally, the metadata filed in the chatgpt plugin retriever have these fileds by default:

{
    "source":  "file",   //email, file or chat
    "source_id": "filename.docx", // the filename
    "url": "", 
    ...
}

so, we should set source_id to source in the langchain metadata.

metadata = d.pop("metadata", d)
if(metadata.get("source_id")):
    metadata["source"] = metadata.pop("source_id")

Who can review?

@dev2049

Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - thanks

@hwchase17 hwchase17 added the lgtm label Jun 10, 2023
@hwchase17 hwchase17 merged commit 50d9c7d into langchain-ai:master Jun 16, 2023
This was referenced Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants