Skip to content

Conversation

@nddipiazza
Copy link
Contributor

JIRA Ticket

https://issues.apache.org/jira/browse/TIKA-4583

Summary

This PR implements an Apache Ignite-based ConfigStore for distributed configuration storage in Tika Pipes clustering deployments.

Changes

  • Added init() method to ConfigStore interface for initialization support
  • Created new Maven sub-module: tika-ignite-config-store
  • Implemented IgniteConfigStore using Apache Ignite distributed cache
  • Supports both REPLICATED and PARTITIONED cache modes
  • Thread-safe implementation with comprehensive error handling
  • Added test suite for IgniteConfigStore (tests currently skipped due to Ignite setup complexity)
  • Updated parent pom.xml to include new module
  • Added comprehensive README with usage examples and configuration options

Testing

  • Module compiles successfully with mvn clean install -DskipTests
  • All checkstyle and forbidden-apis checks pass
  • Manual testing of ConfigStore interface implementation
  • Tests are present but skipped pending proper Ignite test environment setup

Review Focus Areas

  • ConfigStore interface: Added init() method with default implementation
  • IgniteConfigStore implementation: Main distributed config store class
  • Thread safety: Uses Ignite's thread-safe cache operations
  • Error handling: Throws IllegalStateException if not initialized
  • Documentation: Comprehensive README with examples

Files to Review

Critical:

  • tika-pipes/tika-pipes-core/src/main/java/org/apache/tika/pipes/core/config/ConfigStore.java - Interface with new init() method
  • tika-pipes/tika-ignite-config-store/src/main/java/org/apache/tika/pipes/ignite/IgniteConfigStore.java - Main implementation

Supporting:

  • tika-pipes/tika-ignite-config-store/pom.xml - Module dependencies
  • tika-pipes/tika-ignite-config-store/README.md - Usage documentation
  • tika-pipes/pom.xml - Parent pom module addition

Testing Instructions

cd tika-pipes
mvn clean install -DskipTests -pl tika-pipes-core,tika-ignite-config-store

Notes

  • Part of TIKA-4547: Enable distributed state management for Tika Pipes clustering
  • This implementation provides the foundation for sharing Fetcher/Emitter/PipesIterator configs across multiple servers
  • Tests are included but currently skipped due to Ignite requiring full cluster setup for testing

Nicholas DiPiazza added 9 commits December 18, 2025 13:47
- Added init() method to ConfigStore interface for initialization support
- Created new Maven sub-module: tika-ignite-config-store
- Implemented IgniteConfigStore using Apache Ignite distributed cache
- Provides distributed configuration storage for Tika Pipes clustering
- Supports REPLICATED and PARTITIONED cache modes
- Thread-safe implementation with comprehensive error handling
- Added test suite for IgniteConfigStore
- Updated parent pom.xml to include new module
- Added comprehensive README with usage examples
- Added init() call in AbstractComponentManager constructor
- Ensures ConfigStore is properly initialized before use
- Wraps initialization exception in RuntimeException for clarity
- Added configStoreType field to PipesConfig
- Created ConfigStoreFactory to instantiate ConfigStore by type
- Updated TikaGrpcServerImpl to use ConfigStoreFactory
- Added tika-ignite-config-store as optional dependency to tika-grpc
- Created sample configuration showing Ignite usage
- Updated README with distributed configuration documentation

Allows users to toggle between in-memory and Ignite-based distributed
configuration storage by setting configStoreType in tika config:
  - 'memory' (default): local in-memory storage
  - 'ignite': Apache Ignite distributed cache for clustering
  - fully qualified class name: custom ConfigStore implementation
- ConfigStore now extends TikaExtension interface
- ConfigStoreFactory converted to PF4J-based factory interface
- Created IgniteConfigStoreFactory with @extension annotation
- IgniteConfigStore now loaded via plugin discovery
- Updated InMemoryConfigStore and LoggingConfigStore with getExtensionConfig()
- TikaGrpcServerImpl now uses plugin manager to load ConfigStore

Benefits:
- Proper plugin architecture following Tika patterns
- ConfigStore implementations auto-discovered via PF4J
- No hard-coded class names or reflection needed
- Consistent with Fetcher/Emitter factory pattern
- Created IgniteConfigStoreConfig class following HttpFetcherConfig pattern
- Parses JSON from ExtensionConfig to configure cache settings
- Added configStoreParams field to PipesConfig
- Updated TikaGrpcServerImpl to pass params to factory
- Removed TODO comment - configuration now fully implemented

Configuration options supported:
- cacheName: Name of the Ignite cache
- cacheMode: REPLICATED or PARTITIONED
- igniteInstanceName: Name of Ignite instance
- autoClose: Whether to auto-close on shutdown

Example configuration:
{
  "pipes": {
    "configStoreType": "ignite",
    "configStoreParams": {
      "cacheName": "my-cache",
      "cacheMode": "REPLICATED"
    }
  }
}
- Added JSON configuration examples to README
- Documented all configStoreParams options
- Clarified difference between JSON and Java API usage
- Shows complete example with cache mode and instance name
Added detailed guide for deploying tika-grpc with Ignite clustering on Kubernetes:

- Ignite XML configuration with Kubernetes IP finder
- Complete RBAC setup (ServiceAccount, Role, RoleBinding)
- Headless service for pod discovery
- LoadBalancer service for external access
- StatefulSet with proper health checks and resource limits
- ConfigMap for Tika configuration
- Dockerfile example with Ignite plugin
- Troubleshooting guide for common issues
- Network policy considerations
- Pod anti-affinity recommendations
- Monitoring and verification steps

The guide ensures graceful pod-to-pod communication using:
- TcpDiscoveryKubernetesIpFinder for discovery
- Headless service for stable network identities
- Proper RBAC permissions for pod discovery
- StatefulSet for stable pod names and ordering
- Created IgniteConfigStorePlugin extending org.pf4j.Plugin
- Added plugin.properties with plugin metadata
- Added pf4j dependency to pom.xml
- Plugin now properly discoverable by PF4J plugin manager

Plugin metadata:
- plugin.id: tika-ignite-config-store-plugin
- plugin.class: org.apache.tika.pipes.plugin.ignite.IgniteConfigStorePlugin
- plugin.version: 4.0.0-SNAPSHOT
- plugin.provider: Apache Tika

This enables proper plugin discovery and lifecycle management
through the PF4J framework, consistent with other Tika plugins.
The plugin.class property is optional in PF4J. When not specified,
PF4J uses org.pf4j.Plugin as a default wrapper.

Since we don't need custom plugin lifecycle logic (start/stop/delete),
we can simplify by removing IgniteConfigStorePlugin and only keeping:
- plugin.properties (required for plugin metadata)
- @extension annotation on IgniteConfigStoreFactory (for discovery)

This is cleaner and reduces boilerplate code while maintaining
full functionality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant