Skip to content

Sink: Splunk HEC "raw" does not frame events #22969

@milas

Description

@milas

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

Splunk HEC has two endpoint targets: "event" (JSON) and "raw" (text).

Raw events should be framed:

HTTP Event Collector can parse raw text and extract one or more events. HEC expects that the HTTP request contains one or more events with line-breaking rules in effect.

(emphasis mine, source: https://docs.splunk.com/Documentation/SplunkCloud/9.3.2408/Data/FormateventsforHTTPEventCollector#Event_parsing)

Technically, on the Splunk side, you can configure event splitting. The default is:

LINE_BREAKER=([\r\n]+)

See https://docs.splunk.com/Documentation/SplunkCloud/latest/Data/Configureeventlinebreaking.

In practice, this could mean there are cases where you'd want to use a custom character-delimiter for framing on Vector to match the LINE_BREAKER in Splunk.

In general/default, however, newline framing is the expectation.

Currently, the Splunk HEC sink does NO framing and has NO config options to control it unlike most other sinks.

Configuration

sources:
  syslog:
    type: demo_logs
    format: bsd_syslog
    interval: 0.25

sinks:
  splunk_raw:
    type: splunk_hec_logs
    inputs: [syslog]
    endpoint_target: raw
    acknowledgements:
      enabled: true
    endpoint: https://splunk.example.com:8088/
    default_token: xxx
    encoding:
      codec: raw_message
    sourcetype: syslog

demo_logs rate of 0.25 ensures ~4 events/sec, so that there will be multiple in a batch to Splunk (default batch timeout of 1sec)

Version

0.45.0

Debug Output

(Debug output does not contain anything useful, but if there's something specific you want, let me know.)

Example Data

Not dependent on event/data type. An easy way to repro is use a demo_logs source to send some fake syslog to Splunk HEC raw, for example. (See example config above)

For example, here's several demo syslog events that did not get split properly, ending up a single event in Splunk:

<28>Apr 30 16:51:52 names.xn--3pxu8k fwd[3303]: Great Scott! We're never gonna reach 88 mph with the flux capacitor in its current state!<81>Apr 30 16:51:53 for.sakura scraper[1082]: Great Scott! We're never gonna reach 88 mph with the flux capacitor in its current state!<57>Apr 30 16:51:53 we.beauty alerter[9857]: #hugops to everyone who has to deal with this<42>Apr 30 16:51:53 random.audi fwd[7972]: You're not gonna believe what just happened

Additional Context

  • I'm currently still on 0.45.0 but no commits in 0.46.x that would change this behavior
  • Not sure the best way to get Framer into Splunk HEC sink/config given it's internally pretty different than most others

References

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    sink: splunk_hecAnything `splunk_hec` sink relatedtype: bugA code related bug.

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions