Replies: 1 comment
-
|
Status: Open. For exact matches (name and content) - it seems to return duplicate |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm working on uploading files, and I ran into a case where two PPT files had the same text content but had other differences when opened in PowerPoint. However, since the text was the same it was marked as a duplicate.
The immediate problem was that the track id for the "duplicate" didn't resolve to anything, so there was no way to know what happened through the api.
Talking about this with a colleague, we thought maybe the doc id could be based on the md5 of the whole file and not just the text content. You would have duplicate chunks, and everything, but it is an option. The other idea was possibly adding a status for something like "duplicate" and be like a failure - don't insert/chunk/etc, but the track id would resolve, and a status would be available for review.
Anyone else run into issues like this? Willing to hear more feedback on this and how to possibly handle this.
Beta Was this translation helpful? Give feedback.
All reactions