Replies: 1 comment
-
Thanks for your report. As mentioned in our docs, text extraction is hard. You might want to have a look at the different parameters and try different variants there. If you want to further analyze this, feel free to do so. We appreciate PRs which improve the general accuracy of the extraction if there is actual room for improvement. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am extracting text from PDF file with pypdf and it seems buggy. The extracted text contains word with spaces that do not exist in source pdf.
Example:
PDF contains phrase
- Postchirurgie (strabisme, ptérygion)
Extracted text contains
- P ostchirurgie (strabism e ,ptérygion)
Why are these extra spaces inside the words?
Environment
Which environment were you using when you encountered the problem?
Code + PDF
This is a minimal, complete example that shows the issue:
Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!
test.pdf
Beta Was this translation helpful? Give feedback.
All reactions