A Go library for searching and querying Buildkite CI/CD logs with intelligent caching and high-performance data analytics. Includes CLI tools for testing and debugging log parsing.
This library provides a high-level client API for searching and querying Buildkite CI/CD logs with intelligent caching and fast data analytics. Unlike terminal-to-html which focuses on log display and rendering, this library is designed for log data analysis, search, and programmatic access.
The library automatically downloads logs from the Buildkite API, caches them locally as efficient Parquet files, and provides powerful search and query capabilities. It handles Buildkite's special OSC sequence format (\x1b_bk;t=timestamp\x07content) and converts logs into structured, searchable data.
- Features
- Quick Start
- CLI Tools
- Installation
- Examples
- Querying Parquet Files
- API Reference
- Performance
- Testing
- License
- Intelligent Caching: Automatic download and caching of Buildkite logs with Time To Live (TTL) support
- Fast Search & Query: Built-in search capabilities with regex patterns, filtering, and context
- Buildkite API Integration: Direct fetching from Buildkite jobs via REST API with authentication
- Parquet Storage: Efficient columnar storage for fast analytics and data processing using Apache Arrow.
- Streaming Processing: Memory-efficient processing of logs of any size using Go iterators
- Observability Hooks: Optional hooks for tracing and logging without framework coupling
- OSC Sequence Parsing: Correctly handles Buildkite's
\x1b_bk;t=timestamp\x07contentformat - Group Tracking: Automatically associate entries with build sections (
~~~,---,+++) - Content Classification: Identifies commands, group headers, and regular output
- ANSI Code Handling: Optional stripping of ANSI escape sequences for clean text output
- Multiple Output Formats: Text, JSON, and Parquet export with filtering support
- Parse Command: Convert logs to various formats for testing
- Query Command: Fast querying of cached Parquet files
- Debug Command: Troubleshoot OSC sequence parsing issues
For common use cases, the library provides a high-level Client API that simplifies downloading, caching, and querying Buildkite logs:
package main
import (
"context"
"fmt"
"time"
"github.com/buildkite/go-buildkite/v4"
buildkitelogs "github.com/buildkite/buildkite-logs"
)
func main() {
// Create buildkite client
client, _ := buildkite.NewOpts(buildkite.WithTokenAuth("your-token"))
ctx := context.Background()
// Create high-level Client
buildkiteLogsClient, err := buildkitelogs.NewClient(ctx, client, "file://~/.bklog")
if err != nil {
panic(err)
}
defer buildkiteLogsClient.Close()
// Download, cache, and get a reader in one step
reader, err := buildkiteLogsClient.NewReader(
ctx, "myorg", "mypipeline", "123", "job-id",
time.Minute*5, false, // TTL and force refresh
)
if err != nil {
panic(err)
}
// Query the logs
for entry, err := range reader.ReadEntriesIter() {
if err != nil {
panic(err)
}
fmt.Println(entry.Content)
}
}The Client provides:
- Simplified API: Easy-to-use methods for common operations
- Automatic caching: Intelligent caching with TTL support
- Multiple backends: Support for both official
*buildkite.Clientand customBuildkiteAPIimplementations - Parameter validation: Built-in validation with descriptive error messages
- Hooks System: Optional hooks for observability and tracing without coupling to specific frameworks
For detailed documentation, see docs/client-api.md. For a complete working example, see examples/high-level-client/.
Using Make (recommended):
# Build with tests and linting
make all
# Quick development build
make dev
# Build with specific version
make build VERSION=v1.2.3
# Other useful targets
make clean test lint helpManual build:
make buildBuild a snapshot with goreleaser:
goreleaser build --snapshot --clean --single-targetCheck version:
./build/bklog version
# or
./build/bklog -v
# or
./build/bklog --versionParse a log file with timestamps:
./build/bklog parse -file buildkite.logOutput only sections:
./build/bklog parse -file buildkite.log -filter sectionOutput only group headers:
./build/bklog parse -file buildkite.log -filter groupJSON output:
./build/bklog parse -file buildkite.log -jsonFetch logs directly from Buildkite API:
export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog parse -org myorg -pipeline mypipeline -build 123 -job abc-def-456Export API logs to Parquet:
export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog parse -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -parquet logs.parquet -summaryFilter and export only sections from API:
export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog parse -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -filter section -jsonShow processing statistics:
./build/bklog parse -file buildkite.log -summaryOutput:
--- Processing Summary ---
Bytes processed: 24.4 KB
Total entries: 212
Entries with timestamps: 212
Sections: 13
Regular output: 184
Show group/section information:
./build/bklog parse -file buildkite.log -groups | head -5Output:
[2025-04-22 21:43:29.921] [~~~ Running global environment hook] ~~~ Running global environment hook
[2025-04-22 21:43:29.921] [~~~ Running global environment hook] $ /buildkite/agent/hooks/environment
[2025-04-22 21:43:29.948] [~~~ Running global pre-checkout hook] ~~~ Running global pre-checkout hook
[2025-04-22 21:43:29.949] [~~~ Running global pre-checkout hook] $ /buildkite/agent/hooks/pre-checkout
[2025-04-22 21:43:29.975] [~~~ Preparing working directory] ~~~ Preparing working directory
Export to Parquet format:
./build/bklog parse -file buildkite.log -parquet output.parquet -summaryOutput:
--- Processing Summary ---
Bytes processed: 24.4 KB
Total entries: 212
Entries with timestamps: 212
Sections: 13
Regular output: 184
Exported 212 entries to output.parquet
Export filtered data to Parquet:
./build/bklog parse -file buildkite.log -parquet sections.parquet -filter section -summaryThis exports only section entries to a smaller Parquet file for analysis.
The CLI provides fast query operations on previously exported Parquet files:
List all groups with statistics:
./build/bklog query -file output.parquet -op list-groupsOutput:
Groups found: 5
GROUP NAME ENTRIES FIRST SEEN LAST SEEN
------------------------------------------------------------------------------------------------------------
~~~ Running global environment hook 2 1 2025-04-22 21:43:29 2025-04-22 21:43:29
~~~ Running global pre-checkout hook 2 1 2025-04-22 21:43:29 2025-04-22 21:43:29
--- :package: Build job checkout dire... 2 1 2025-04-22 21:43:30 2025-04-22 21:43:30
--- Query Statistics ---
Total entries: 10
Matched entries: 10
Total groups: 5
Query time: 2.36 ms
Filter entries by group pattern:
./build/bklog query -file output.parquet -op by-group -group "environment"Output:
Entries in group matching 'environment': 2
[2025-04-22 21:43:29.921] [GRP] ~~~ Running global environment hook
[2025-04-22 21:43:29.922] [CMD] $ /buildkite/agent/hooks/environment
--- Query Statistics ---
Total entries: 10
Matched entries: 2
Query time: 0.36 ms
Search entries using regex patterns:
./build/bklog query -file output.parquet -op search -pattern "git clone"Output:
Matches found: 1
[2025-04-22 21:43:29.975] [~~~ Preparing working directory] MATCH: $ git clone -v -- https://github.com/buildkite/bash-example.git .
--- Search Statistics (Streaming) ---
Total entries: 212
Matches found: 1
Query time: 0.65 ms
Search with context lines (ripgrep-style):
./build/bklog query -file output.parquet -op search -pattern "error|failed" -C 3Output:
Matches found: 2
[2025-04-22 21:43:30.690] [~~~ Running script] Running tests...
[2025-04-22 21:43:30.691] [~~~ Running script] Test suite started
[2025-04-22 21:43:30.692] [~~~ Running script] Running unit tests
[2025-04-22 21:43:30.693] [~~~ Running script] MATCH: Test failed: authentication error
[2025-04-22 21:43:30.694] [~~~ Running script] Cleaning up test files
[2025-04-22 21:43:30.695] [~~~ Running script] Test run completed
[2025-04-22 21:43:30.696] [~~~ Running script] Generating report
--
[2025-04-22 21:43:30.750] [~~~ Post-processing] Validating results
[2025-04-22 21:43:30.751] [~~~ Post-processing] Checking exit codes
[2025-04-22 21:43:30.752] [~~~ Post-processing] Build status: some tests failed
[2025-04-22 21:43:30.753] [~~~ Post-processing] MATCH: Build failed due to test failures
[2025-04-22 21:43:30.754] [~~~ Post-processing] Uploading logs
[2025-04-22 21:43:30.755] [~~~ Post-processing] Notifying team
[2025-04-22 21:43:30.756] [~~~ Post-processing] Cleanup completed
Search with separate before/after context:
./build/bklog query -file output.parquet -op search -pattern "npm install" -B 2 -A 5Case-sensitive search:
./build/bklog query -file output.parquet -op search -pattern "ERROR" -case-sensitiveInvert match (show non-matching lines):
./build/bklog query -file output.parquet -op search -pattern "buildkite" -invert-match -limit 5Reverse search (find recent errors first):
./build/bklog query -file output.parquet -op search -pattern "error|failed" -reverse -C 2Reverse search from specific position:
./build/bklog query -file output.parquet -op search -pattern "test.*failed" -reverse -search-seek 1000Search with JSON output:
./build/bklog query -file output.parquet -op search -pattern "git clone" -format json -C 1JSON output for programmatic use:
./build/bklog query -file output.parquet -op list-groups -format jsonQuery without statistics:
./build/bklog query -file output.parquet -op list-groups -stats=falseQuery last 20 entries:
./build/bklog query -file output.parquet -op tail -tail 20Query specific row position:
./build/bklog query -file output.parquet -op seek -seek 100Limit query results:
./build/bklog query -file output.parquet -op by-group -group "test" -limit 50Get file information:
./build/bklog query -file output.parquet -op infoDump all entries from the file:
./build/bklog query -file output.parquet -op dumpDump with limited entries:
./build/bklog query -file output.parquet -op dump -limit 100Dump all entries as JSON:
./build/bklog query -file output.parquet -op dump -format jsonDump entries with raw output (no timestamps/groups):
./build/bklog query -file output.parquet -op dump -rawDump entries with ANSI codes stripped:
./build/bklog query -file output.parquet -op dump -strip-ansiThe query command now supports direct API integration, automatically downloading and caching logs from Buildkite:
Query logs directly from Buildkite API:
export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op list-groupsQuery specific group from API logs:
export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op by-group -group "tests"Search API logs with regex patterns:
export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op search -pattern "error|failed" -C 2Search API logs with case sensitivity:
export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op search -pattern "ERROR" -case-sensitiveReverse search API logs (find recent failures):
export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op search -pattern "test.*failed" -reverse -C 2Query last 10 entries from API logs:
export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op tail -tail 10Get file info for cached API logs:
export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op infoDump all entries from API logs:
export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op dumpQuery with custom cache TTL:
export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op info -cache-ttl=5mForce refresh cached logs:
export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op list-groups -cache-force-refreshUse custom cache location:
export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op info -cache-url=file:///tmp/bklogsLogs are automatically downloaded and cached in ~/.bklog/ as {org}-{pipeline}-{build}-{job}.parquet files. Subsequent queries use the cached version unless the cache is manually cleared.
The CLI includes a debug command for troubleshooting parser corruption issues, especially useful when investigating problems with OSC sequence parsing:
Debug parser behavior on specific lines:
./build/bklog debug -file buildkite.log -start 17 -limit 5 -verboseOutput:
=== Debug Mode: parse ===
File: buildkite.log
Lines: 17-21
--- Line 17 ---
Timestamp: 2025-07-01 09:20:41.629 +1000 AEST (Unix: 1751321141)
Content: "remote: Counting objects: 0% (1/287)K_bk;t=1751321141629remote: Counting objects: 1% (3/287)K..."
Group: ""
RawLine length: 6619
IsCommand: false
IsGroup: false
Show hex dump of corrupted lines:
./build/bklog debug -file buildkite.log -mode hex -start 17 -limit 1Output:
=== Debug Mode: hex ===
File: buildkite.log
Lines: 17-17
--- Line 17 ---
Length: 6619 bytes
00000000 1b 5f 62 6b 3b 74 3d 31 37 35 31 33 32 31 31 34 |._bk;t=175132114|
00000010 31 36 32 39 07 72 65 6d 6f 74 65 3a 20 43 6f 75 |1629.remote: Cou|
00000020 6e 74 69 6e 67 20 6f 62 6a 65 63 74 73 3a 20 20 |nting objects: |
00000030 20 30 25 20 28 31 2f 32 38 37 29 1b 5b 4b 1b 5f | 0% (1/287).[K._|
00000040 62 6b 3b 74 3d 31 37 35 31 33 32 31 31 34 31 36 |bk;t=17513211416|
Show raw line content with line numbers:
./build/bklog debug -file buildkite.log -mode lines -start 100 -limit 3Output:
=== Debug Mode: lines ===
File: buildkite.log
Lines: 100-102
--- Line 100 ---
Raw: "\x1b_bk;t=1751321141985\aremote: Total 2113 (delta 1830), reused 2113 (delta 1830), pack-reused 0\r"
Length: 98
--- Line 101 ---
Raw: "\x1b_bk;t=1751321142039\aReceiving objects: 100% (2113/2113), 630.45 KiB | 630.00 KiB/s, done.\r"
Length: 102
Debug with combined options:
./build/bklog debug -file buildkite.log -start 50 -end 55 -verbose -raw -hexThis will show verbose parse information, raw line content, and hex dump for lines 50-55.
./build/bklog debug [options]Required:
-file <path>: Path to log file to debug (required)
Range Options:
-start <line>: Start line number (1-based, default: 1)-end <line>: End line number (0 = start+limit or EOF, default: 0)-limit <num>: Number of lines to process (default: 10)
Mode Options:
-mode <mode>: Debug mode:parse,hex,lines(default:parse)
Display Options:
-verbose: Show detailed parsing information (default: false)-raw: Show raw line content (default: false)-hex: Show hex dump of each line (default: false)-parsed: Show parsed log entry (default: true)
Investigating Parser Corruption: The debug command is particularly useful for investigating issues where the parser only handles the first OSC sequence per line but ignores subsequent ones, causing content corruption.
Common Issues Debugged:
- Multiple OSC sequences per line (e.g., progress updates)
- Malformed OSC sequences missing proper terminators
- ANSI escape sequences interfering with parsing
- Timestamp extraction failures
- Content/group association problems
Example Workflow:
# 1. Identify problematic lines in output
./build/bklog parse -file buildkite.log | grep -n "unexpected content"
# 2. Debug specific lines with verbose output
./build/bklog debug -file buildkite.log -start 142 -limit 1 -verbose
# 3. Examine raw bytes if needed
./build/bklog debug -file buildkite.log -start 142 -limit 1 -mode hex
# 4. Compare multiple lines to understand patterns
./build/bklog debug -file buildkite.log -start 140 -end 145 -raw
# 5. Extract all timestamps to CSV for analysis
./build/bklog debug -file buildkite.log -mode extract-timestamps -csv timestamps.csvExtract all OSC timestamps to CSV:
./build/bklog debug -file buildkite.log -mode extract-timestamps -csv timestamps.csvThis extracts all OSC sequence timestamps from the log file into a CSV file with columns: line_number, osc_offset, timestamp_ms, timestamp_formatted.
The repository includes test data files that you can use to try out the tail functionality:
View last 5 entries from the test log:
./build/bklog query -file ./testdata/bash-example.parquet -op tail -tail 5Output:
[2025-04-22 21:43:32.739] [CMD] $ echo 'Tests passed!'
[2025-04-22 21:43:32.740] Tests passed!
[2025-04-22 21:43:32.740] [GRP] +++ End of Example tests
[2025-04-22 21:43:32.740] [CMD] $ buildkite-agent annotate --style success 'Build passed'
[2025-04-22 21:43:32.748] Annotation added
View last 10 entries (default) with JSON output:
./build/bklog query -file ./testdata/bash-example.parquet -op tail -format jsonParse the raw log file and immediately query the last 3 entries:
# First create a fresh parquet file from the raw log
./build/bklog parse -file ./testdata/bash-example.log -parquet temp.parquet
# Then query the last 3 entries
./build/bklog query -file temp.parquet -op tail -tail 3Combine with other operations - show file info then tail:
# Get file statistics
./build/bklog query -file ./testdata/bash-example.parquet -op info
# Then view the last few entries
./build/bklog query -file ./testdata/bash-example.parquet -op tail -tail 7Dump all entries from the test file:
./build/bklog query -file ./testdata/bash-example.parquet -op dumpDump first 10 entries as JSON:
./build/bklog query -file ./testdata/bash-example.parquet -op dump -limit 10 -format json./build/bklog parse [options]Local File Options:
-file <path>: Path to Buildkite log file (use this OR API parameters below)
Buildkite API Options:
-org <slug>: Buildkite organization slug (for API access)-pipeline <slug>: Buildkite pipeline slug (for API access)-build <number>: Buildkite build number or UUID (for API access)-job <id>: Buildkite job ID (for API access)
Output Options:
-json: Output as JSON instead of text-filter <type>: Filter entries by type (group,section)-summary: Show processing summary at the end-groups: Show group/section information for each entry-parquet <path>: Export to Parquet file (e.g., output.parquet)-jsonl <path>: Export to JSON Lines file (e.g., output.jsonl)
./build/bklog query [options]Data Source Options (choose one):
-file <path>: Path to Parquet log file (use this OR API parameters below)
Buildkite API Options:
-org <slug>: Buildkite organization slug (for API access)-pipeline <slug>: Buildkite pipeline slug (for API access)-build <number>: Buildkite build number or UUID (for API access)-job <id>: Buildkite job ID (for API access)
Query Options:
-op <operation>: Query operation (list-groups,by-group,search,info,tail,seek,dump) (default:list-groups)-group <pattern>: Group name pattern to filter by (forby-groupoperation)-format <format>: Output format (text,json) (default:text)-stats: Show query statistics (default:true)-limit <number>: Limit number of entries returned (0 = no limit, enables early termination)-tail <number>: Number of lines to show from end (fortailoperation, default: 10)-seek <row>: Row number to seek to (0-based, forseekoperation)-raw: Output raw log content without timestamps, groups, or other prefixes-strip-ansi: Strip ANSI escape codes from log content
Search Options:
-pattern <regex>: Regex pattern to search for (forsearchoperation)-A <num>: Show NUM lines after each match (ripgrep-style)-B <num>: Show NUM lines before each match (ripgrep-style)-C <num>: Show NUM lines before and after each match (ripgrep-style)-case-sensitive: Enable case-sensitive search (default: case-insensitive)-invert-match: Show non-matching lines instead of matching ones-reverse: Search backwards from end/seek position (useful for finding recent errors first)-search-seek <row>: Start search from this row number (0-based, useful with-reverse)
Cache Options (API mode only):
-cache-ttl <duration>: Cache TTL for non-terminal jobs (default: 30s)-cache-force-refresh: Force refresh cached entry (ignores cache)-cache-url <url>: Cache storage URL (file://path, s3://bucket, etc., default: ~/.bklog)
./build/bklog debug [options]Required:
-file <path>: Path to log file to debug (required)
Range Options:
-start <line>: Start line number (1-based, default: 1)-end <line>: End line number (0 = start+limit or EOF, default: 0)-limit <num>: Number of lines to process (default: 10)
Mode Options:
-mode <mode>: Debug mode:parse,hex,lines,extract-timestamps(default:parse)
Display Options:
-verbose: Show detailed parsing information (default: false)-raw: Show raw line content (default: false)-hex: Show hex dump of each line (default: false)-parsed: Show parsed log entry (default: true)-csv <path>: Output CSV file for extract-timestamps mode
Note: For API usage, set BUILDKITE_API_TOKEN environment variable. Logs are automatically downloaded and cached in ~/.bklog/.
Security: Keep your Buildkite API token secure. Never commit tokens to version control or expose them in logs. Use environment variables or secure secret management systems.
Lines that represent shell commands being executed:
[2025-04-22 21:43:29.975] $ git clone -v -- https://github.com/buildkite/bash-example.git .
Headers that mark different phases of the build (collapsible in Buildkite UI):
[2025-04-22 21:43:29.921] ~~~ Running global environment hook
[2025-04-22 21:43:30.694] --- :package: Build job checkout directory
[2025-04-22 21:43:30.699] +++ :hammer: Example tests
The parser automatically tracks which section or group each log entry belongs to:
[2025-04-22 21:43:29.921] [~~~ Running global environment hook] ~~~ Running global environment hook
[2025-04-22 21:43:29.921] [~~~ Running global environment hook] $ /buildkite/agent/hooks/environment
[2025-04-22 21:43:29.948] [~~~ Running global pre-checkout hook] ~~~ Running global pre-checkout hook
Each entry is automatically associated with the most recent group header (~~~, ---, or +++). This allows you to:
- Group related log entries by build phase
- Filter logs by group for focused analysis
- Understand build structure and timing relationships
- Export structured data with group context preserved
The parser can export log entries to Apache Parquet format using the official Apache Arrow Go implementation for efficient storage and analysis. Parquet files can be directly queried by tools like DuckDB, Apache Spark, and Pandas for powerful log analytics:
The library uses a two-tier intelligent caching strategy that optimizes for both performance and data freshness:
flowchart TD
A[Start: DownloadAndCache] --> B[Check blob storage cache]
B --> C{Cache exists?}
C -->|No| H[Download logs from API]
C -->|Yes| D{Force refresh?}
D -->|Yes| H
D -->|No| E[Get job status]
E --> F{Job is terminal?}
F -->|Yes| G[Use cache immediately<br/>Terminal jobs never expire]
F -->|No| I{Time elapsed < TTL?}
I -->|Yes| J[Use cache<br/>Within TTL window]
I -->|No| H
H --> K[Parse logs to Parquet]
K --> L[Store in blob storage with metadata]
L --> M[Create local cache file]
G --> N[Create local cache file]
J --> N
M --> O[Return local file path]
N --> O
classDef terminal fill:#1a472a,stroke:#4ade80,color:#ffffff
classDef cache fill:#1e3a8a,stroke:#60a5fa,color:#ffffff
classDef download fill:#7c2d12,stroke:#fb923c,color:#ffffff
classDef decision fill:#374151,stroke:#9ca3af,color:#ffffff
class G,F terminal
class B,C,I,J,N cache
class H,K,L,M download
class D,E decision
Caching Strategy:
- Terminal Jobs: Once a job completes, logs never change → cache forever (no TTL check)
- Running Jobs: Logs may still be updated → respect TTL to ensure fresh data
- Force Refresh: Override cache entirely for debugging or manual refresh scenarios
- Columnar storage: Efficient compression and query performance
- Schema preservation: Maintains data types and structure
- Analytics ready: Compatible with Pandas, Apache Spark, DuckDB, and other data tools
- Compact size: Typically 70-90% smaller than JSON for log data
- Fast queries: Optimized for analytical workloads and filtering
The exported Parquet files contain the following columns:
| Column | Type | Description |
|---|---|---|
timestamp |
int64 | Unix timestamp in milliseconds since epoch |
content |
string | Log content after OSC sequence processing |
group |
string | Current build group/section name |
flags |
int32 | Bitwise flags field (HasTimestamp=1, IsCommand=2, IsGroup=4) |
The flags column uses bitwise operations to efficiently store multiple boolean properties:
| Flag | Bit Position | Value | Description |
|---|---|---|---|
HasTimestamp |
0 | 1 | Entry has a valid timestamp |
IsCommand |
1 | 2 | Entry is a shell command |
IsGroup |
2 | 4 | Entry is a group header |
Basic export:
./build/bklog -file buildkite.log -parquet output.parquetExport with filtering:
./build/bklog -file buildkite.log -parquet commands.parquet -filter commandExport with streaming processing:
./build/bklog -file buildkite.log -parquet output.parquet -summaryThis uses the modern iter.Seq2[*LogEntry, error] iterator pattern for memory-efficient processing.
type LogEntry struct {
Timestamp time.Time // Parsed timestamp (zero if no timestamp)
Content string // Log content after OSC sequence
RawLine []byte // Original raw log line as bytes
Group string // Current section/group this entry belongs to
}
type Parser struct {
// Internal regex patterns
}// Create a new parser
func NewParser() *Parser
// Parse a single log line
func (p *Parser) ParseLine(line string) (*LogEntry, error)
// Create iter.Seq2 iterator with proper error handling (streaming approach)
func (p *Parser) All(reader io.Reader) iter.Seq2[*LogEntry, error]
// Strip ANSI escape sequences
func (p *Parser) StripANSI(content string) stringfunc (entry *LogEntry) HasTimestamp() bool
func (entry *LogEntry) CleanContent() string // Content with ANSI stripped
func (entry *LogEntry) IsCommand() bool
func (entry *LogEntry) IsGroup() bool // Check if entry is a group header (~~~, ---, +++)
func (entry *LogEntry) IsSection() bool // Deprecated: use IsGroup() instead// Export using iter.Seq2 streaming iterator
func ExportSeq2ToParquet(seq iter.Seq2[*LogEntry, error], filename string) error
// Export using iter.Seq2 with filtering
func ExportSeq2ToParquetWithFilter(seq iter.Seq2[*LogEntry, error], filename string, filterFunc func(*LogEntry) bool) error
// Create a new Parquet writer for streaming
func NewParquetWriter(file *os.File) *ParquetWriter
// Write a batch of entries to Parquet
func (pw *ParquetWriter) WriteBatch(entries []*LogEntry) error
// Close the Parquet writer
func (pw *ParquetWriter) Close() error// Create a new Parquet reader
func NewParquetReader(filename string) *ParquetReader
// Stream entries from a Parquet file
func ReadParquetFileIter(filename string) iter.Seq2[ParquetLogEntry, error]
// Filter streaming entries by group pattern (case-insensitive)
func FilterByGroupIter(entries iter.Seq2[ParquetLogEntry, error], groupPattern string) iter.Seq2[ParquetLogEntry, error]// Stream all log entries from the Parquet file
func (pr *ParquetReader) ReadEntriesIter() iter.Seq2[ParquetLogEntry, error]
// Stream entries filtered by group pattern
func (pr *ParquetReader) FilterByGroupIter(groupPattern string) iter.Seq2[ParquetLogEntry, error]type ParquetLogEntry struct {
Timestamp int64 `json:"timestamp"` // Unix timestamp in milliseconds
Content string `json:"content"` // Log content
Group string `json:"group"` // Associated group/section
Flags LogFlags `json:"flags"` // Bitwise flags (HasTimestamp=1, IsCommand=2, IsGroup=4)
}
// Backward-compatible methods
func (entry *ParquetLogEntry) HasTime() bool // Returns Flags.HasTimestamp()
func (entry *ParquetLogEntry) IsCommand() bool // Returns Flags.IsCommand()
func (entry *ParquetLogEntry) IsGroup() bool // Returns Flags.IsGroup()
type LogFlags int32
// Bitwise flag operations
func (lf LogFlags) Has(flag LogFlag) bool // Check if flag is set
func (lf *LogFlags) Set(flag LogFlag) // Set flag
func (lf *LogFlags) Clear(flag LogFlag) // Clear flag
func (lf *LogFlags) Toggle(flag LogFlag) // Toggle flag
// Convenience methods
func (lf LogFlags) HasTimestamp() bool // Check HasTimestamp flag
func (lf LogFlags) IsCommand() bool // Check IsCommand flag
func (lf LogFlags) IsGroup() bool // Check IsGroup flag
type GroupInfo struct {
Name string `json:"name"` // Group/section name
EntryCount int `json:"entry_count"` // Number of entries in group
FirstSeen time.Time `json:"first_seen"` // Timestamp of first entry
LastSeen time.Time `json:"last_seen"` // Timestamp of last entry
Commands int `json:"commands"` // Number of command entries
}The parser includes comprehensive benchmarks to measure performance. Run them with:
go test -bench=. -benchmemSingle Line Parsing (Byte-based):
- OSC sequence with timestamp: ~64 ns/op, 192 B/op, 3 allocs/op
- Regular line (no timestamp): ~29 ns/op, 128 B/op, 2 allocs/op
- ANSI-heavy line: ~68 ns/op, 224 B/op, 3 allocs/op
Memory Usage (10,000 lines):
- Seq2 Streaming Iterator: ~3.5 MB allocated, 64,006 allocations
- Constant memory footprint regardless of file size
Streaming Throughput:
- 100 lines: ~51,000 ops/sec
- 1,000 lines: ~5,200 ops/sec
- 10,000 lines: ~510 ops/sec
- 100,000 lines: ~54 ops/sec
ANSI Stripping: ~7.7M ops/sec, 160 B/op, 2 allocs/op
Parquet Export Performance (1,000 lines, Apache Arrow):
- Seq2 streaming export: ~1,100 ops/sec, 1.2 MB allocated
Content Classification Performance (1,000 entries):
-
IsCommand(): ~15,000 ops/sec, 84 KB allocated
-
IsGroup(): ~14,000 ops/sec, 84 KB allocated
-
CleanContent(): ~15,000 ops/sec, 84 KB allocated
Parquet Streaming Query Performance (Apache Arrow Go v18):
- ReadEntriesIter: Constant memory usage, ~5,700 entries/sec
- FilterByGroupIter: Early termination support, ~5,700 entries/sec
- Memory-efficient: Processes files of any size with constant memory footprint
Streaming Query Scalability:
- Constant memory usage regardless of file size
- Early termination support for partial processing
- Linear processing time scales with data size
- No memory allocation growth for large files
Byte-based Parser vs Regex:
- 10x faster OSC sequence parsing (~46ns vs ~477ns)
- 10x faster ANSI stripping (~127ns vs ~1311ns)
- Fewer allocations (2 vs 5 for ANSI stripping)
- Better memory efficiency for complex lines
Streaming Memory Efficiency:
- Constant memory footprint regardless of file size
- True streaming processing for files of any size
- Early termination capability with immediate resource cleanup
- Memory-safe processing of multi-gigabyte files
Run the test suite:
go test -vRun benchmarks:
go test -bench=. -benchmemThe tests cover:
- OSC sequence parsing
- Timestamp extraction
- ANSI code stripping
- Content classification
- Stream processing
- Iterator functionality
- Memory usage patterns
This library was developed with assistance from Claude (Anthropic) for parsing, query functionality, and performance optimization.
This project is licensed under the MIT License.