Skip to content

Conversation

@nddipiazza
Copy link
Contributor

@nddipiazza nddipiazza commented Dec 26, 2025

Problem

Users had to include plugin-roots in tika-config.json file, which made Docker deployments less flexible. Additionally, several plugin packaging issues prevented proper plugin loading.

Solutions

1. Added --plugin-roots CLI parameter

Added --plugin-roots command-line parameter that overrides config file.

Usage:

java -jar tika-grpc.jar -c config.json --plugin-roots /tmp/tika-plugins

2. Fixed plugin assembly descriptors

Fixed MANIFEST.MF packaging for 3 plugins (az-blob, gcs, jdbc) that were failing with "Cannot find the manifest path" errors.

Root cause: Assembly descriptors tried to include MANIFEST.MF from classes/ directory, but it only exists in the JAR.

Solution: Use <dependencySet> with <unpack> to extract MANIFEST.MF from the project JAR.

3. Fixed GCS plugin.properties

Corrected plugin class reference:

  • ❌ Was: org.apache.tika.pipes.emitter.gcs.GCSEmitterPlugin
  • ✅ Now: org.apache.tika.pipes.plugin.gcs.GCSPipesPlugin

Results

✅ All 13 plugins now load successfully in tika-grpc-docker
✅ Flexible Docker/Kubernetes deployments
✅ Backward compatible

Testing

Verified with tika-grpc-docker that all plugins resolve and start:

Adds --plugin-roots command-line parameter to override plugin-roots from config file.

Problem:
Users had to include 'plugin-roots' in tika-config.json file, which made
Docker deployments less flexible. Different environments might need different
plugin locations.

Solution:
- Added --plugin-roots CLI parameter to TikaGrpcServer
- Parameter accepts comma-separated list of plugin directories
- CLI parameter overrides config file if specified
- Falls back to config file if not specified

Changes:
- TikaGrpcServer: Added --plugin-roots parameter
- TikaGrpcServerImpl: Updated constructor to accept pluginRootsOverride
- TikaPluginManager: Added loadFromPaths() method for string-based paths

Usage:
java -jar tika-grpc.jar -c config.json --plugin-roots /tmp/tika-plugins

Or Docker:
docker run apache/tika-grpc:latest -c /config/config.json --plugin-roots /tmp/tika-plugins

Benefits:
- No need to modify config files for different environments
- Simplifies Docker/Kubernetes deployments
- Backward compatible - config file still works if CLI not specified
The assembly.xml was trying to include MANIFEST.MF from classes/ directory
but it only exists in the JAR file. Changed to use dependencySet with
unpack to properly extract MANIFEST.MF and extensions.idx from the
project artifact JAR into the plugin ZIP.

This fixes 'Cannot find the manifest path' errors for these 3 plugins.
The plugin.properties referenced the wrong class name:
  Wrong: org.apache.tika.pipes.emitter.gcs.GCSEmitterPlugin
  Correct: org.apache.tika.pipes.plugin.gcs.GCSPipesPlugin

This caused ClassNotFoundException when loading the GCS plugin.
@nddipiazza nddipiazza changed the title TIKA-4581: Add --plugin-roots CLI parameter for tika-grpc TIKA-4581: Fix packaging issues and allow plugin-roots override Dec 27, 2025
@nddipiazza nddipiazza merged commit 70da6e0 into main Dec 27, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant