Skip to content

shiv3/git-of-theseus-go

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

git-of-theseus-go

Go implementation of git-of-theseus, a tool for analyzing how a Git repository has evolved over time.

Features

  • πŸ“Š Analyze Git repository evolution with cohort analysis
  • πŸ‘₯ Track code ownership and authorship over time
  • πŸ“ˆ Visualize code survival rates
  • πŸ—‚οΈ Support for file extension and directory analysis
  • 🌐 Domain-based contributor analysis
  • ⚑ Parallel processing for improved performance
  • πŸ—οΈ Clean Architecture design

Installation

go install github.com/git-of-theseus-go@latest

Or build from source:

git clone https://github.com/shiv3/git-of-theseus-go.git
cd git-of-theseus-go
go build

Usage

# Analyze current repository
git-of-theseus-go

# Analyze specific repository
git-of-theseus-go /path/to/repo

# Analyze specific branch
git-of-theseus-go /path/to/repo --branch develop

Command Line Options

Required Arguments

  • [repo-path]: Path to the Git repository (default: current directory)

Optional Flags

Analysis Options

  • --branch, -b string: Branch to analyze
  • --interval string: Minimum time between commits to analyze
    • Human-readable formats: 7d, 2w, 1m, 1y, 336h
    • Raw seconds: 604800 (backward compatible)
    • Default: 7d (7 days)
  • --since string: Analyze commits since this date (YYYY-MM-DD)
  • --until string: Analyze commits until this date (YYYY-MM-DD)
  • --max-commits int: Maximum number of commits to analyze

File Filtering

  • --only string: Only analyze files matching patterns (comma-separated)
    • Example: "*.go,*.js"
  • --ignore string: Ignore files matching patterns (comma-separated)
    • Example: "*_test.go,vendor/*"

Performance

  • --procs int: Number of parallel processes (default: 1)
  • --quiet, -q: Suppress progress output

Interval Format Examples

The --interval flag accepts various time formats:

# Days
git-of-theseus-go --interval 7d      # 7 days
git-of-theseus-go --interval "14 days" # 14 days

# Weeks
git-of-theseus-go --interval 2w      # 2 weeks
git-of-theseus-go --interval "1 week"  # 1 week

# Months (approximated as 30 days)
git-of-theseus-go --interval 1m      # 1 month
git-of-theseus-go --interval "2 months" # 2 months

# Years (approximated as 365 days)
git-of-theseus-go --interval 1y      # 1 year

# Hours
git-of-theseus-go --interval 336h    # 336 hours (14 days)

# Raw seconds (for compatibility)
git-of-theseus-go --interval 604800  # 604800 seconds (7 days)

Usage Examples

Basic Analysis

# Analyze with default 7-day intervals
git-of-theseus-go /path/to/repo

# Analyze with 2-week intervals
git-of-theseus-go /path/to/repo --interval 2w

# Analyze develop branch with monthly intervals
git-of-theseus-go /path/to/repo --branch develop --interval 1m

Date Range Analysis

# Analyze commits from last year only
git-of-theseus-go --since 2024-01-01 --until 2024-12-31

# Analyze commits from the last 6 months
git-of-theseus-go --since 2024-07-01

# Analyze commits up to a specific date
git-of-theseus-go --until 2024-06-30

# Combine date range with custom interval
git-of-theseus-go --since 2024-01-01 --until 2024-12-31 --interval 2w

Performance Optimization

# Use 8 parallel workers for faster processing
git-of-theseus-go /path/to/repo --procs 8

# Limit to 100 commits for quick analysis
git-of-theseus-go /path/to/repo --max-commits 100

File Filtering

# Only analyze Go source files
git-of-theseus-go --only "*.go"

# Analyze JavaScript/TypeScript, ignore tests
git-of-theseus-go --only "*.js,*.ts,*.tsx" --ignore "*test*,*.spec.*"

# Ignore vendor and node_modules directories
git-of-theseus-go --ignore "vendor/*,node_modules/*"

Output Files

The tool generates JSON files with analysis results:

File Description
authors.json Lines of code per author over time
cohorts.json Code survival by year of creation
exts.json Distribution by file extensions
dirs.json Distribution by directories
domains.json Distribution by email domains
survival.json Code survival statistics

Output Format Example

authors.json:

{
  "y": [[100, 150, 200], [50, 75, 100]],
  "ts": ["2024-01-01", "2024-02-01", "2024-03-01"],
  "labels": ["Alice", "Bob"]
}

Architecture

The project follows Clean Architecture principles:

git-of-theseus-go/
β”œβ”€β”€ domain/          # Core business logic
β”‚   β”œβ”€β”€ entity/      # Domain entities
β”‚   └── repository/  # Repository interfaces
β”œβ”€β”€ usecase/         # Application business rules
β”œβ”€β”€ infrastructure/  # External interfaces
β”‚   β”œβ”€β”€ git/         # Git operations
β”‚   └── filesystem/  # File operations
└── presentation/    # UI/CLI layer
    └── cli/         # Command line interface

Performance Features

  • Native Git Integration: Uses native git blame for better performance
  • Incremental Analysis: Caches unchanged files between commits
  • Parallel Processing: Concurrent analysis with configurable workers
  • Smart Sampling: Time-based commit sampling for large repositories
  • Optimized for Large Repos: Automatic file sampling for massive codebases

Requirements

  • Go 1.25 or later
  • Git installed and accessible in PATH
  • Read access to the target repository

Differences from Python Version

This Go implementation maintains compatibility with the original Python version while offering:

  • ⚑ Significantly faster performance through parallelization
  • πŸš€ Native Git command integration
  • πŸ’Ύ Better memory management for large repositories
  • πŸ—οΈ Clean Architecture for maintainability
  • πŸ• Human-readable interval formats
  • πŸ“… Date range filtering with --since and --until options

License

Apache License 2.0

Acknowledgments

This is a Go reimplementation of git-of-theseus by Erik Bernhardsson.