Skip to content

feat: add default multiline autodetection #215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 10, 2025

Conversation

obs-gh-alexlew
Copy link
Collaborator

Description

OB-XXX Please explain the changes you made here.

Checklist

  • Created tests which fail without the change (if possible)
  • Extended the README / documentation, if necessary

Copy link

@orca-security-us orca-security-us bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Infrastructure as Code high 0   medium 0   low 0   info 0 View in Orca
Passed Passed SAST high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Vulnerabilities high 0   medium 0   low 0   info 0 View in Orca

- id: multiline-recombine
type: recombine
combine_field: body
is_first_entry: body matches "^(\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}|\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}(\\.\\d+)?Z?|[A-Za-z]+ \\d{1,2} \\d{2}:\\d{2}:\\d{2}|\\d{4}\\/\\d{2}\\/\\d{2} \\d{2}:\\d{2}:\\d{2})"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments on the regex:

  • can we document what this is and where it came from?
  • is \\/ right for forward slash? I'm surprised it needs to be escaped at all.
  • is the space vs T important? Can we combine those?
  • is - vs / important for dates? can we combine them?
    • ex for my last two combination requests: \d{4}[/-]\d{2}[/-]\d{2}[T ]\d{2}:\d{2}:\d{2}(\.\d+)?Z? and then it could be OR'ed with [A-Za-z]+ \d{1,2} \d{2}:\d{2}:\d{2} like you have it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are just based on the 4 timestamps that datadog supports https://docs.datadoghq.com/agent/logs/auto_multiline_detection/?tab=configurationfile#enable-multi-line-aggregation-per-integration and they're just OR'd together

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the escaping is weird because the recombine is another layer of parsing, it was throwing errors without

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've merged the regexs for slashes versus dashes, the other ones I think its clearer to leave them separate since they're less close

@@ -23,6 +23,12 @@ receivers:
operators:
- type: filter
expr: 'body matches "otel-contrib"'
{{- if .HostMonitoring.Logs.AutoMultilineDetection }}
- id: multiline-recombine
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this operator differ from the top level multiline config option?
https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.124.0/receiver/filelogreceiver#example---multiline-logs-parsing
If they work the same, I think it's better to use the multiline option since it can be overridden by users more easily than a field in an array.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so i think specifically the order of operations between the multiline top level config and then the operators pipelines means that we need to define this as another operator after the container operator for kubernetes specifically. I dont actually know if its necessary in this case but figured it might be best to do it the same way everywhere so that the behavior is most likely to be similar.

Copy link
Collaborator

@obs-gh-mattcotter obs-gh-mattcotter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the comment, looks good!

@obs-gh-mattcotter obs-gh-mattcotter removed their assignment Jun 10, 2025
@obs-gh-alexlew obs-gh-alexlew merged commit 8a4fbd0 into main Jun 10, 2025
12 checks passed
@obs-gh-alexlew obs-gh-alexlew deleted the alew/add-multililne-auto branch June 10, 2025 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants