This receiver tails and parses logs from files.
| Status | |
|---|---|
| Stability | beta: logs |
| Distributions | contrib, k8s |
| Issues | |
| Code coverage | |
| Code Owners | @andrzej-stencel, @paulojmdias, @VihasMakwana, @braydonk | Seeking more code owners! |
| Emeritus | @djaglowski |
| Field | Default | Description |
|---|---|---|
include |
required | A list of file glob patterns that match the file paths to be read. |
exclude |
[] | A list of file glob patterns to exclude from reading. This is applied against the paths matched by include. |
exclude_older_than |
Exclude files whose modification time is older than the specified age. | |
start_at |
end |
At startup, where to start reading logs from the file. Options are beginning or end. |
multiline |
A multiline configuration block. See below for more details. |
|
force_flush_period |
500ms |
Time since last time new data was found in the file, after which a partial log at the end of the file may be emitted. |
encoding |
utf-8 |
The encoding of the file being read. See the list of supported encodings below for available options. |
preserve_leading_whitespaces |
false |
Whether to preserve leading whitespaces. |
preserve_trailing_whitespaces |
false |
Whether to preserve trailing whitespaces. |
include_file_name |
true |
Whether to add the file name as the attribute log.file.name. |
include_file_path |
false |
Whether to add the file path as the attribute log.file.path. |
include_file_name_resolved |
false |
Whether to add the file name after symlinks resolution as the attribute log.file.name_resolved. |
include_file_path_resolved |
false |
Whether to add the file path after symlinks resolution as the attribute log.file.path_resolved. |
include_file_owner_name |
false |
Whether to add the file owner name as the attribute log.file.owner.name. Not supported for windows. |
include_file_owner_group_name |
false |
Whether to add the file group name as the attribute log.file.owner.group.name. Not supported for windows. |
include_file_permissions |
false |
Whether to add the file permissions as the attribute log.file.permissions in 3-digit octal format (e.g., 755). Not supported for windows. |
include_file_record_number |
false |
Whether to add the record number in the file as the attribute log.file.record_number. |
include_file_record_offset |
false |
Whether to add the record offset in the file as the attribute log.file.record_offset |
poll_interval |
200ms | The duration between filesystem polls. |
fingerprint_size |
1kb |
The number of bytes with which to identify a file. The first bytes in the file are used as the fingerprint. Decreasing this value at any point will cause existing fingerprints to forgotten, meaning that all files will be read from the beginning (one time) |
initial_buffer_size |
16KiB |
The initial size of the to read buffer for headers and logs, the buffer will be grown as necessary. Larger values may lead to unnecessary large buffer allocations, and smaller values may lead to lots of copies while growing the buffer. |
max_log_size |
1MiB |
The maximum size of a log entry to read. The behavior for oversized log entries is controlled by max_log_size_behavior. Protects against reading large amounts of data into memory. |
max_log_size_behavior |
split |
Behavior when a log entry exceeds max_log_size. Options are split (default) which splits oversized entries into multiple log entries, or truncate which truncates the entry and drops the remainder. |
max_concurrent_files |
1024 | The maximum number of log files from which logs will be read concurrently. If the number of files matched in the include pattern exceeds this number, then files will be processed in batches. |
max_batches |
0 | Only applicable when files must be batched in order to respect max_concurrent_files. This value limits the number of batches that will be processed during a single poll interval. A value of 0 indicates no limit. |
delete_after_read |
false |
If true, each log file will be read and then immediately deleted. Requires that the filelog.allowFileDeletion feature gate is enabled. Must be false when start_at is set to end. |
acquire_fs_lock |
false |
Whether to attempt to acquire a filesystem lock before reading a file (Unix only). |
attributes |
{} | A map of key: value pairs to add to the entry's attributes. |
resource |
{} | A map of key: value pairs to add to the entry's resource. |
operators |
[] | An array of operators. See below for more details. |
storage |
none | The ID of a storage extension to be used to store file offsets. File offsets allow the receiver to pick up where it left off in the case of a collector restart. If no storage extension is used, the receiver will manage offsets in memory only. |
header |
nil | Specifies options for parsing header metadata. Requires that the filelog.allowHeaderMetadataParsing feature gate is enabled. See below for details. Must not be set when start_at is set to end. |
header.pattern |
required for header metadata parsing | A regex that matches every header line. |
header.metadata_operators |
required for header metadata parsing | A list of operators used to parse metadata from the header. |
retry_on_failure.enabled |
false |
If true, the receiver will pause reading a file and attempt to resend the current batch of logs if it encounters an error from downstream components. |
retry_on_failure.initial_interval |
1s |
Time to wait after the first failure before retrying. |
retry_on_failure.max_interval |
30s |
Upper bound on retry backoff interval. Once this value is reached the delay between consecutive retries will remain constant at the specified value. |
retry_on_failure.max_elapsed_time |
5m |
Maximum amount of time (including retries) spent trying to send a logs batch to a downstream consumer. Once this value is reached, the data is discarded. Retrying never stops if set to 0. |
ordering_criteria.regex |
Regular expression used for sorting, should contain a named capture groups that are to be used in regex_key. |
|
ordering_criteria.group_by |
Regular expression used for grouping, which is done pre-sorting. Should contain a named capture groups. | |
ordering_criteria.top_n |
1 | The number of files to track when using file ordering. The top N files are tracked after applying the ordering criteria. |
ordering_criteria.sort_by.regex_key |
Regular expression named capture group defined in ordering_criteria.regex to use for sorting. |
|
ordering_criteria.sort_by.sort_type |
Type of sorting to be performed (e.g., numeric, alphabetical, timestamp, mtime) |
|
ordering_criteria.sort_by.location |
Relevant if sort_type is set to timestamp. Defines the location of the timestamp of the file. |
|
ordering_criteria.sort_by.format |
Relevant if sort_type is set to timestamp. Defines the strptime format of the timestamp being sorted. |
|
ordering_criteria.sort_by.ascending |
Sort direction | |
compression |
Indicate the compression format of input files. If set accordingly, files will be read using a reader that uncompresses the file before scanning its content. Options are ``, gzip, or `auto`. `auto` auto-detects file compression type. Currently, gzip files are the only compressed files auto-detected, based on its headers See RFC 1952. `auto` option is useful when ingesting a mix of compressed and uncompressed files with the same filelogreceiver. |
|
polls_to_archive |
0 |
This settings controls the number of poll cycles to store on disk, rather than being discarded. By default, the receiver will purge the record of readers that have existed for 3 generations. Refer archiving and polling for more details. Note: This feature is experimental. |
on_truncate |
ignore |
Behavior when a file with the same fingerprint is detected but with a smaller size (indicating a copytruncate rotation). Options are ignore, read_whole_file, or read_new. See handling copytruncate rotation for more details. |
Note that by default, no logs will be read from a file that is not actively being written to because start_at defaults to end.
Each operator performs a simple responsibility, such as parsing a timestamp or JSON. Chain together operators to process logs into a desired format.
- Every operator has a
type. - Every operator can be given a unique
id. If you use the same type of operator more than once in a pipeline, you must specify anid. Otherwise, theiddefaults to the value oftype. - Operators will output to the next operator in the pipeline. The last operator in the pipeline will emit from the receiver. Optionally, the
outputparameter can be used to specify theidof another operator to which logs will be passed directly. - Only parsers and general purpose operators should be used.
If set, the multiline configuration block instructs the file_input operator to split log entries on a pattern other than newlines.
The multiline configuration block must contain exactly one of line_start_pattern or line_end_pattern. These are regex patterns that
match either the beginning of a new log entry, or the end of a log entry.
The omit_pattern setting can be used to omit the start/end pattern from each entry.
| Key | Description |
|---|---|
nop |
No encoding validation. Treats the file as a stream of raw bytes |
utf-8 |
UTF-8 encoding |
utf-8-raw |
UTF-8 encoding without replacing invalid UTF-8 bytes |
utf-16le |
UTF-16 encoding with little-endian byte order |
utf-16be |
UTF-16 encoding with big-endian byte order |
ascii |
ASCII encoding |
big5 |
The Big5 Chinese character encoding |
Other less common encodings are supported on a best-effort basis. See https://www.iana.org/assignments/character-sets/character-sets.xhtml for other encodings available.
To enable header metadata parsing, the filelog.allowHeaderMetadataParsing feature gate must be set, and start_at must be beginning.
If set, the file input operator will attempt to read a header from the start of the file. Each header line must match the header.pattern pattern. Each line is emitted into a pipeline defined by header.metadata_operators. Any attributes on the resultant entry from the embedded pipeline will be merged with the attributes from previous lines (attribute collisions will be resolved with an upsert strategy). After all header lines are read, the final merged header attributes will be present on every log line that is emitted for the file.
The header lines are not emitted by the receiver.
- An entry is the base representation of log data as it moves through a pipeline. All operators either create, modify, or consume entries.
- A field is used to reference values in an entry.
- A common expression syntax is used in several operators. For example, expressions can be used to filter or route entries.
Many parsers operators can be configured to embed certain followup operations such as timestamp and severity parsing. For more information, see complex parsers.
All time parameters must have the unit of time specified. e.g.: 200ms, 1s, 1m.
File Log Receiver can read files that are being rotated. It supports both common rotation strategies: move/create (the file is renamed and a new file is created) and copy/truncate (the file is copied to a backup and the original is truncated). The receiver tracks files by their internal identity (inode) and content fingerprint, allowing it to handle both strategies transparently and continue reading data even if the new filename no longer matches the include pattern.
The receiver handles file attributes differently depending on whether rotated files match your include pattern:
Rotated files NOT matching the include pattern:
When a file is rotated and the rotated filename no longer matches the include pattern, the receiver preserves the original file attributes (e.g., log.file.name, log.file.path). This ensures logs read from the rotated file continue to be associated with their original file identity.
Example: With include: /var/log/pods/*/*/0.log, when 0.log is rotated to 0.log.20260115-120000, logs from the rotated file will still report log.file.name=0.log.
Rotated files matching the include pattern:
When a file is rotated and the rotated filename continues to match the include pattern, the receiver reports the new rotated filename in file attributes. This is expected behavior that prevents duplicate metrics when tracking per-file consumption.
Example: With include: /var/log/pods/*/*/*.log*, when 0.log is rotated to 0.log.20260115-120000, logs from the rotated file will report log.file.name=0.log.20260115-120000.
For more details, see issue #38454.
When log files are rotated using the copytruncate strategy (where the file is copied and then truncated in place), the receiver can detect when a file has been truncated by comparing the stored offset with the current file size. The on_truncate setting controls how the receiver behaves when truncation is detected:
ignore(default): The receiver keeps the original offset and will not read any data until the file grows past the original offset. This prevents duplicate log ingestion when a file is rotated.read_whole_file: The receiver resets the offset to 0 and reads the entire file from the beginning. Use this mode when you want to ensure no data loss, even if it means potentially re-reading some logs.read_new: The receiver updates the offset to the current file size (the position after truncation). This allows reading new data that is written after the truncation without re-reading existing content.
Example configuration:
receivers:
filelog:
include: [ /var/log/myapp/*.log ]
on_truncate: read_whole_file # Read entire file after copytruncate rotation
operators:
- type: json_parserWhen to use each mode:
- Use
ignorewhen you want to avoid duplicate logs and your log rotation strategy ensures that rotated files are properly renamed or moved. - Use
read_whole_filewhen data completeness is critical and you can tolerate duplicate logs, or when you have deduplication logic downstream. - Use
read_newwhen files are expected to be deleted after rotation and you want to read only new data written after the truncation point.
Receiver Configuration
receivers:
file_log:
include: [ /var/log/myservice/*.json ]
operators:
- type: json_parser
timestamp:
parse_from: attributes.time
layout: '%Y-%m-%d %H:%M:%S'Receiver Configuration
receivers:
file_log:
include: [ /simple.log ]
operators:
- type: regex_parser
regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
timestamp:
parse_from: attributes.time
layout: '%Y-%m-%d %H:%M:%S'
severity:
parse_from: attributes.sevThe above configuration will read logs from the "simple.log" file. Some examples of logs that it will read:
2023-06-19 05:20:50 ERROR This is a test error message
2023-06-20 12:50:00 DEBUG This is a test debug message
Receiver Configuration
receivers:
file_log:
include:
- /var/log/example/multiline.log
multiline:
line_start_pattern: ^ExceptionThe above configuration will be able to parse multiline logs, splitting every time the ^Exception pattern is met.
Exception in thread 1 "main" java.lang.NullPointerException
at com.example.myproject.Book.getTitle(Book.java:16)
at com.example.myproject.Author.getBookTitles(Author.java:25)
at com.example.myproject.Bootstrap.main(Bootstrap.java:14)
Exception in thread 2 "main" java.lang.NullPointerException
at com.example.myproject.Book.getTitle(Book.java:16)
at com.example.myproject.Author.getBookTitles(Author.java:25)
at com.example.myproject.Bootstrap.main(Bootstrap.java:44)
Receiver Configuration
receivers:
file_log:
include:
- /var/log/example/compressed.log.gz
compression: gzipThe above configuration will be able to read gzip compressed log files by setting the compression option to gzip.
When this option is set, all files ending with that suffix are scanned using a gzip reader that decompresses the file content
before scanning through it. Please note that if the compressed file is expected to be updated, the additional compressed logs must be appended to the
compressed file, rather than recompressing the whole content and overwriting the previous file.
The storage setting allows you to define the proper storage extension for storing file offsets.
While the storage parameter can ensure that log files are consumed accurately, it is possible that
logs are dropped while moving downstream through other components in the collector.
For additional resiliency, see Fault tolerant log collection example
Here is some of the information the file log receiver stores:
- The number of files it is currently tracking (
knownFiles). - For each file being tracked:
- The fingerprint of the file (
Fingerprint.first_bytes). - The byte offset from the start of the file, indicating the position in the file from where the
file log receiver continues reading the file (
Offset). - An arbitrary set of file attributes, such as the name of the file (
FileAttributes).
- The fingerprint of the file (
Exactly how this information is serialized depends on the type of storage being used.
If polls_to_archive setting is used in conjunction with storage setting, file offsets older than three poll cycles are stored on disk rather than being discarded. This feature enables the receiver to remember file for a longer period and also aims to use limited amount of memory.
This is useful when exclude_older_than setting is used and the user wants the receiver to remember offsets of files for longer period of times. This helps prevent duplication if a file is modified after the exclude_older_than duration has passed.
Note that if the polls_to_archive setting is used without specifying storage, the receiver will revert to the default behavior i.e. purge the record of readers that have existed for 3 generations.
If the receiver is being used to track a symlinked file and the symlink target is expected to change frequently, make sure
to set the value of the poll_interval setting to something lower than the symlink update frequency.
Enabling Collector metrics
will also provide telemetry metrics for the state of the receiver's file consumption.
Specifically, the otelcol_fileconsumer_open_files and otelcol_fileconsumer_reading_files metrics
are provided.
When this feature gate is enabled, the receiver uses protobuf encoding for checkpoint storage instead of JSON. This provides improved performance (~7x faster decoding) and reduced storage usage (~31% smaller).
The feature includes full backward compatibility:
- The receiver can always read both protobuf and JSON checkpoints regardless of the feature gate setting
- When the feature gate is enabled, new checkpoints are written in protobuf format
- When the feature gate is disabled, new checkpoints are written in JSON format
To enable this feature gate, use the flag: --feature-gates=filelog.protobufCheckpointEncoding
Schedule for this feature gate is:
- Introduce as
Alpha(disabled by default) inv0.148.0
When this feature gate is enabled, the fingerprint of compressed file is computed by first decompressing its data. Note, it is important to set compression to a non-empty value for it to work.
This can cause existing gzip files to be re-ingested because of changes in how fingerprints are computed.
Schedule for this feature gate is:
- Introduce as
Alpha(disabled by default) inv0.128.0 - Move to
Beta(enabled by default) inv0.133.0 - Move to
Stable(cannot be disabled) inv0.142.0