-
Notifications
You must be signed in to change notification settings - Fork 379
Description
Describe the bug
A distributed streaming custom command does not receive all events of the search in one invocation, it gets the events in several chunks.
We have a distributed streaming custom command which searches for hits of URLS in a list of IOCs. It has to process 10 millions of events with an URL and compare them against a list of about million IOCs. The custom command builds an optimized cache data structure to be able to process this load. This cache is stored in a local file on the indexer, and loaded each time the command is invoked on an indexer.
This works well in our current environment consisting of 96 indexers with Splunk 8.2.9 and using Python 2 and splunklib 1.6.12. In average each indexer processes about 100'000 events each time the custom command is called.
Switching to Python 3 and splunklib 1.7.4 as a preparation for our planed migration to Splunk 9x, we found the command failing. Further diagnosis showed that the command is called several times on each indexer, with just a small batch of events streamed on each invocation. This makes the whole caching mechanism useless as it takes about 120 seconds to load the cache, while processing the 100'000 events takes only a few seconds.
Further investigation shows that the problem lies in the splunklib 1.7.4, 1.6.12 works as expected as long as the size of the streamed data is not to big, otherwise SDK 1.6.12 just hangs.
To Reproduce
Create a an app containing a custom command p3s7lookupurldebugempty:
in ./bin, the python file p3s7lookupurldebugempty.py
import sys
import time
from splunklib.searchcommands import \
dispatch, StreamingCommand, Configuration, Option, validators
@Configuration()
class p3s7lookupurldebugempty(StreamingCommand):
def stream(self, records):
cnt = 0
file="/tmp/debug.log"
for record in records:
cnt = cnt + 1
yield {'_serial': 73+cnt, '_time': time.time(), '_raw': "Counter %d" % cnt}
with open(file, 'a') as logfile:
logfile.write("%f build=202308210731p3s7 Finished with: %d events\n" % (time.time(), cnt))
_globals = {'__builtins__': __builtins__}
if __name__ == "__main__":
dispatch(p3s7lookupurldebugempty, sys.argv, sys.stdin, sys.stdout, __name__)
Add it to commands.conf:
[p3s7testlookupurldebugempty]
filename = p3s7testlookupurl-debug-empty.py
python.version=python3
chunked = true
add the splunklib as ./bin/splunklib
Prepare test data:
| makeresults count=50000
| streamstats count as nr
| eval batch="50000Short"
| eval _time=time()-86400+nr
| table *
| collect index=main
Run the search:
index=main batch="50000Short" | p3s7testlookupurldebugempty
See the results in /tmp/debug on the indexers (only one will show it as only one indexer will have to process the events)
1692596001.205690 build=202308210731p3s7 Finished with: 50 events
1692596001.240201 build=202308210731p3s7 Finished with: 449 events
1692596001.431701 build=202308210731p3s7 Finished with: 2500 events
1692596001.633610 build=202308210731p3s7 Finished with: 2490 events
1692596001.832549 build=202308210731p3s7 Finished with: 2500 events
1692596002.020129 build=202308210731p3s7 Finished with: 2500 events
1692596002.211033 build=202308210731p3s7 Finished with: 2500 events
1692596002.396127 build=202308210731p3s7 Finished with: 2500 events
1692596002.593029 build=202308210731p3s7 Finished with: 2500 events
1692596002.780966 build=202308210731p3s7 Finished with: 2500 events
1692596002.966760 build=202308210731p3s7 Finished with: 2500 events
1692596003.164372 build=202308210731p3s7 Finished with: 2500 events
1692596003.551674 build=202308210731p3s7 Finished with: 4999 events
1692596004.515754 build=202308210731p3s7 Finished with: 12500 events
1692596005.037160 build=202308210731p3s7 Finished with: 7012 events
1692596005.038402 build=202308210731p3s7 Finished with: 0 events
1692596005.038932 build=202308210731p3s7 Finished with: 0 events
1692596005.039385 build=202308210731p3s7 Finished with: 0 events
Expected behavior
Expected result would be:
1692595462.334300 build=202308180849p2 Finished with: 50000 events
Splunk:
- Version: 8.2.0
- OS: Red Hat Enterprise Linux release 8.3 (Ootpa) (Oracle 5.4.17-2102.201.3.el8uek.x86_64)
- Deployment: Distributed, see below
SDK (please complete the following information):
- Version: 1, 7, 4
- Language Runtime Version: Python 3.7.10
- OS: Red Hat Enterprise Linux release 8.3 (Ootpa) (Orale 5.4.17-2102.201.3.el8uek.x86_64)
Additional context
This Test have been Executed on following environment
- single search head
- 1 master node
- 4 node indexer cluster
All Splunk 8.2.0 (the same error also occurs on Splunk 8.2.9)