Table of Contents
- Intro
- Compatibility
- Working Conversions
- On-the-fly Extraction of Archive Files
- User Preferences
- Performance
- Screenshots
- Other Nextcloud PDF Converters
- Todo, some problems I am aware of
This is an app for the Nextcloud cloud software. It adds a new menu entry to the actions menu of each folder, archive, or individual file in the files view which lets you download, respectively, entire directories trees, all files in archives, or other individual files, converted and assembled as a single PDF file. Additionally, it adds a tab to the details view where version actions can be performed.
For the PDF generation, the following steps are performed:
- walk through the given folder
- convert all found files to PDF
- optionally transparently traverse archive files (zip etc.)
- handle some special cases
- try to convert the remaining files with
unoconv
or an admin-provided fallback-script - generate a PDF placeholder error page for each failed conversion
- then combine all found or generated PDF files in one document using
pdftk
- add bookmarks to mark the start of each folder and each file
- existing bookmarks are "shifted down" accordingly
- the resulting bookmark structure resembles the folder structure
- optionally place a "Folder PAGE/MAX_PAGES" label at the top of each page
- finally, present the generated PDF as a download or save it to the cloud file system.
The app offers the choice between online and background PDF generation. "Background" means that a job is scheduled, and then runs independently of the web browser frontend. The user will be notified after the job has been completed.
Please refer to the app's meta data
This is a required dependency, it must be installed. It is available
in most Linux distros. Unfortunately the most recent
release has
some known issues. To my knowledge the version installed on Gentoo
Linux just works fine. This means: if you encounter blank pages or
pages with "strange" dimensions: accept the challenge to install PDFTk
from its source distributions and then try again. On Gentoo seemingly
the branch bc176
is used.
- PDF files ;) -- of course, just pass-through
- office files via LibreOffice
- this needs
unoserver
orunoconv
to be installed - see https://github.com/unoconv/unoserver
- this needs
- HTML files via
wkhtmltopdf
if installed- this program is no longer maintained but yields very good results
weasyprint
if installed- Python library, actively maintained, available in most Linux distros.
pandoc
- well, yields very poor results as any CSS attributes are ignored.
- EML (RFC822) files, i.e. emails you saved to disk, via
mhonarc
- then run the HTML to PDF conversion chain
- TIFF files via
tiff2pdf
- Postscript files via
ps2pdf
- everything else is passed to
unoconvert
(orunoconv
if the newerunoserver
is not installed) - if
unoconvert
/unoconv
fails, a PDF placeholder error page is generated
Administrators may specify a shell script or program for
-
default conversion: try this script before any other converters, if it fails continue with the builtin converters
-
fallback conversion: if all other converters fail, try the given script as a fallback, if that fails also generate an error page.
If no fallback converter is configured then
unoconv
is used as the fallback.
If enabled by an admin users can choose to enable on-the-fly extraction of archive files.
- To somehow reduce the danger of zip bombs, there is a hard-coded upper limit of the decompressed archive size
- administrators can lower this limit to reduce resource usage on the server, or if they feel that the built-in limit of 2^30 bytes is too high.
- users may decrease this limit further on a per-user basis
- administrators may be disabled by administrators altogether
- if enable users may decide by themselves whether to enable this feature or not
This package relies on
wapmorgan/unified-archive
as the archive handling backend. Please see over there for a list of
supported archive formats and how to support further archive formats.
The app allows configuring page labels and automatically generated download and destination file names based on a user-configured template. The details can be found in Braced Text Templates.
- the fonts can be customized from the list of fonts shopped with
tcpdf
- the backend generates font samples for the chosen fonts and also provides a preview of the configured page labels with the chosen font.
Files can be included or excluded by regular expressions and a setting controls whether one or the other regular expression has precedence in case both patterns match. Unfortunately, those patterns cannot (yet) be controlled from the "details" panel.
If enabled by the administrators, users can optionally disable the on-the-fly handling of archive files and also restrict the archive size limit imposed by the admins further.
Optionally individual files (as opposed to directory trees and archives) can directly be converted to PDF. The default is to enable this feature. The drawback is that this adds an actions menu entry to each filesystem node, even to PDF files themselves.
- Unfortunately, the app is not the fastest horse one could think of.
In particular, the
unoconv
(LibreOffice) converter tends to be somewhat slow. Conversion time increases linearly with the number of files to be converted, of course. - It might be necessary to tweak your web server to allow for larger execution times (several minutes) if you do not want to make use of the background PDF generation.
At least two other apps are also either dedicated to or, respectively, allow for PDF conversion:
nextcloud/workflow_pdf_converter
- this app is dedicated to automated PDF conversion based on workflow rules
- at the time of this writing, conversion is done with LibreOffice
newroco/emlviewer
- as the name states this is a viewer module for
.eml
files (emails) - the EML view also provides a PDF download button
- at the time of this writing, PDF conversion is done with MPDF
- as the name states this is a viewer module for
- please feel free to submit issues!
- ZIP-bomb detection might need improvement
- There is no test suite. This is really an issue.