Doc-Textify is a TypeScript library and command-line tool that extracts and cleans text from various document formats.
-
Multi-format support:
- Microsoft Word (
.docx
) - PowerPoint (
.pptx
) - Excel (
.xlsx
) - OpenOffice/LibreOffice (
.odt
,.odp
,.ods
) - PDF (
.pdf
) - Plain text (
.txt
) - HTML (
.html
,.htm
)
- Microsoft Word (
-
Content cleaning: removes extra whitespace, handles custom line delimiters.
-
Configurable options: set newline delimiter, minimum characters to extract, and toggle error logging.
Install the package and import it in your project:
npm install doc-textify --save
import { docTextify } from 'doc-textify'
// async/await version
try {
const text = await docTextify('path/to/file.pdf')
} catch (e) {
console.error(err)
}
// or callback version
docTextify('path/to/file.pdf')
.then(text => console.log(text))
.catch(err => console.error(err))
Default options:
try {
const text = await docTextify('path/to/file.pdf', {
newlineDelimiter: '\n', // output content delimiter
minCharsToExtract: 0, // number of chars required to output the content, default disabled (0)
outputErrorToConsole: true // log error to console
})
} catch (e) {
console.error(err)
}
If you prefer a ready-made command, the doc-textify
CLI wraps the same functionality:
Global install to use the doc-textify
command anywhere:
npm install -g doc-textify
Or install locally:
npm install doc-textify --save
doc-textify <path/to/document> [options]
Option | Description | Default |
---|---|---|
-n , --newlineDelimiter |
Line delimiter to insert | "\n" |
-m , --minCharsToExtract |
Minimum number of characters to extract | 0 (disabled) |
-h , --help |
Display help message | — |
doc-textify document.docx -n "\r\n" -m 20 > output.txt
git clone https://github.com/johaven/doc-textify.git
cd doc-textify
npm install
npm run build # outputs compiled files into /dist
npm run test # test parsing
- Fork the repository
- Create a branch:
git checkout -b feature/my-feature
- Commit your changes:
git commit -m "Add my feature"
- Push to your branch:
git push origin feature/my-feature
- Open a Pull Request
This project is licensed under the MIT License.