Skip to content

Latest commit

Β 

History

History
496 lines (372 loc) Β· 15.7 KB

File metadata and controls

496 lines (372 loc) Β· 15.7 KB
name Semantic Function Refactoring
description Analyzes Go codebase daily to identify opportunities for semantic function extraction and refactoring
true
workflow_dispatch schedule
daily
permissions
contents issues pull-requests
read
read
read
network
allowed
defaults
containers
imports
shared/reporting.md
safe-outputs
threat-detection close-issue create-issue
enabled
required-title-prefix target max
[refactor]
*
10
title-prefix labels max
[refactor]
refactoring
code-quality
automated-analysis
1
tools
github edit bash
toolsets allowed-repos min-integrity
default
issues
github/gh-aw-mcpg
unapproved
*
timeout-minutes 20
strict true

Semantic Function Clustering and Refactoring

You are an AI agent that analyzes Go code to identify potential refactoring opportunities by clustering functions semantically and detecting outliers or duplicates.

Mission

IMPORTANT: Before performing analysis, close any existing open issues with the title prefix [refactor] to avoid duplicate issues.

Analyze all Go source files (.go files, excluding test files) in the repository to:

  1. First, close existing open issues with the [refactor] prefix
  2. Collect all function names per file
  3. Cluster functions semantically by name and purpose
  4. Identify outliers (functions that might be in the wrong file)
  5. Use Serena's semantic analysis to detect potential duplicates
  6. Suggest refactoring fixes

Important Constraints

  1. Only analyze .go files - Ignore all other file types
  2. Skip test files - Never analyze files ending in _test.go
  3. Focus on internal/ directory - Primary analysis area (this repository uses internal/ instead of pkg/)
  4. Use Serena for semantic analysis - Leverage the MCP server's capabilities
  5. One file per feature rule - Files should be named after their primary purpose/feature

Repository Structure

This is the MCP Gateway repository (gh-aw-mcpg), which uses the internal/ directory structure:

  • internal/cmd/ - CLI commands (Cobra)
  • internal/config/ - Configuration parsing (TOML/JSON)
  • internal/server/ - HTTP server (routed/unified modes)
  • internal/mcp/ - MCP protocol types
  • internal/launcher/ - Backend process management
  • internal/guard/ - Security guards
  • internal/difc/ - Security labels
  • internal/logger/ - Debug logging framework
  • internal/timeutil/ - Time formatting utilities
  • internal/tty/ - TTY handling
  • internal/sys/ - System utilities

Total: ~39 Go files (excluding tests)

Serena Configuration

The Serena MCP server is configured for this workspace:

  • Workspace: ${{ github.workspace }}
  • Memory cache: /tmp/gh-aw/cache-memory/serena
  • Context: codex
  • Language service: Go (gopls)

Close Existing Refactor Issues (CRITICAL FIRST STEP)

Before performing any analysis, you must close existing open issues with the [refactor] title prefix to prevent duplicate issues.

Use the GitHub API tools to:

  1. Search for open issues with title containing [refactor] in repository ${{ github.repository }}
  2. Close each found issue with a comment explaining a new analysis is being performed
  3. Use the close_issue safe output to close these issues

Important: The close-issue safe output is configured with:

  • required-title-prefix: "[refactor]" - Only issues starting with this prefix will be closed
  • target: "*" - Can close any issue by number (not just triggering issue)
  • max: 10 - Can close up to 10 issues in one run

To close an existing refactor issue, emit:

close_issue(issue_number=123, body="Closing this issue as a new semantic function refactoring analysis is being performed.")

Do not proceed with analysis until all existing [refactor] issues are closed.

Task Steps

1. Close Existing Refactor Issues

CRITICAL FIRST STEP: Before performing any analysis, close existing open issues with the [refactor] prefix to prevent duplicate issues.

  1. Use GitHub search to find open issues with [refactor] in the title
  2. For each found issue, use close_issue to close it with an explanatory comment
  3. Example: close_issue(issue_number=4542, body="Closing this issue as a new semantic function refactoring analysis is being performed.")

Do not proceed to step 2 until all existing [refactor] issues are closed.

2. Activate Serena Project

After closing existing issues, activate the project in Serena to enable semantic analysis:

# Serena's activate_project tool should be called with the workspace path
# This is handled automatically by the MCP server configuration

Use Serena's activate_project tool with the workspace path.

3. Discover Go Source Files

Find all non-test Go files in the repository:

# Find all Go files excluding tests in internal/ directory
find internal -name "*.go" ! -name "*_test.go" -type f | sort

Group files by package/directory to understand the organization.

4. Collect Function Names Per File

For each discovered Go file:

  1. Use Serena's get_symbols_overview to get all symbols (functions, methods, types) in the file
  2. Use Serena's read_file if needed to understand context
  3. Create a structured inventory of:
    • File path
    • Package name
    • All function names
    • All method names (with receiver type)
    • Function signatures (parameters and return types)

Example structure:

File: internal/config/validation.go
Package: config
Functions:
  - ValidateConfig(cfg *Config) error
  - validateServer(name string, srv Server) error
  - expandEnvVars(s string) (string, error)

5. Semantic Clustering Analysis

Analyze the collected functions to identify patterns:

Clustering by Naming Patterns:

  • Group functions with similar prefixes (e.g., create*, parse*, validate*)
  • Group functions with similar suffixes (e.g., *Helper, *Config, *Handler)
  • Identify functions that operate on the same data types
  • Identify functions that share common functionality

File Organization Rules: According to Go best practices, files should be organized by feature:

  • server.go - core server functionality
  • routed.go - routed mode implementation
  • unified.go - unified mode implementation
  • validation.go - validation-related functions
  • *_test.go - test files (excluded from analysis)

Identify Outliers: Look for functions that don't match their file's primary purpose:

  • Validation functions in a server handler file
  • Parser functions in a network file
  • Helper functions scattered across multiple files
  • Generic utility functions not in a dedicated utils file

6. Use Serena for Semantic Duplicate Detection

For each cluster of similar functions:

  1. Use find_symbol to locate functions with similar names across files
  2. Use search_for_pattern to find similar code patterns
  3. Use find_referencing_symbols to understand usage patterns
  4. Compare function implementations to identify:
    • Exact duplicates (identical implementations)
    • Near duplicates (similar logic with variations)
    • Functional duplicates (different implementations, same purpose)

Example Serena tool usage:

# Find symbols with similar names
# Use find_symbol for "processData" or similar
# Use search_for_pattern to find similar implementations

7. Deep Reasoning Analysis

Apply deep reasoning to identify refactoring opportunities:

Duplicate Detection Criteria:

  • Functions with >80% code similarity
  • Functions with identical logic but different variable names
  • Functions that perform the same operation on different types (candidates for generics)
  • Helper functions repeated across multiple files

Refactoring Patterns to Suggest:

  • Extract Common Function: When 2+ functions share significant code
  • Move to Appropriate File: When a function is in the wrong file based on its purpose
  • Create Utility File: When helper functions are scattered
  • Use Generics: When similar functions differ only by type
  • Extract Interface: When similar methods are defined on different types

8. Generate Refactoring Report

Create a comprehensive issue with findings:

Report Structure:

# πŸ”§ Semantic Function Clustering Analysis

*Analysis of repository: ${{ github.repository }}*

## Executive Summary

[Brief overview of findings - total files analyzed, clusters found, outliers identified, duplicates detected]

## Function Inventory

### By Package

[List of packages with file counts and primary purposes]

### Clustering Results

[Summary of function clusters identified by semantic similarity]

## Identified Issues

### 1. Outlier Functions (Functions in Wrong Files)

**Issue**: Functions that don't match their file's primary purpose

#### Example: Validation in Server File

- **File**: `internal/server/routed.go`
- **Function**: `validateRequest(req *Request) error`
- **Issue**: Validation function in server handler file
- **Recommendation**: Move to `internal/server/validation.go` or consider consolidating with config validation
- **Estimated Impact**: Improved code organization

[... more outliers ...]

### 2. Duplicate or Near-Duplicate Functions

**Issue**: Functions with similar or identical implementations

#### Example: Error Formatting Duplicates

- **Occurrence 1**: `internal/logger/error_formatting.go:formatError(err error) string`
- **Occurrence 2**: `internal/mcp/connection.go:formatMCPError(err error) string`
- **Similarity**: 85% code similarity
- **Code Comparison**:
  ```go
  // logger/error_formatting.go
  func formatError(err error) string {
      if err == nil {
          return ""
      }
      return fmt.Sprintf("error: %v", err)
  }
  
  // mcp/connection.go
  func formatMCPError(err error) string {
      if err == nil {
          return "no error"
      }
      return fmt.Sprintf("MCP error: %v", err)
  }
  • Recommendation: Consolidate into single function in internal/logger/error_formatting.go with customizable prefix
  • Estimated Impact: Reduced code duplication, easier maintenance

[... more duplicates ...]

3. Scattered Helper Functions

Issue: Similar helper functions spread across multiple files

Examples:

  • parseValue() in multiple files
  • formatError() in different packages
  • sanitizeInput() in various locations

Recommendation: Create or enhance utility files in appropriate packages Estimated Impact: Centralized utilities, easier testing

4. Opportunities for Generics

Issue: Type-specific functions that could use generics

[Examples of functions that differ only by type]

Detailed Function Clusters

Cluster 1: Configuration Functions

Pattern: Configuration loading, parsing, and validation Files: internal/config/ Functions:

  • config.go:LoadConfig(...)
  • validation.go:ValidateConfig(...)
  • env_validation.go:ValidateEnv(...)

Analysis: Well-organized by functionality βœ“

Cluster 2: Server Functions

Pattern: HTTP server and routing Files: internal/server/ Functions: [list]

Analysis: [Whether organization is good or needs improvement]

[... more clusters ...]

Refactoring Recommendations

Priority 1: High Impact

  1. Move Outlier Functions

    • Move validation functions to appropriate validation files
    • Move parser functions to appropriate parser files
    • Estimated effort: 2-4 hours
    • Benefits: Clearer code organization
  2. Consolidate Duplicate Functions

    • Merge duplicate error formatting functions
    • Merge duplicate string processing functions
    • Estimated effort: 3-5 hours
    • Benefits: Reduced code size, single source of truth

Priority 2: Medium Impact

  1. Centralize Helper Functions
    • Create or enhance helper utility files
    • Move scattered helpers to central location
    • Estimated effort: 4-6 hours
    • Benefits: Easier discoverability, reduced duplication

Priority 3: Long-term Improvements

  1. Consider Generics for Type-Specific Functions
    • Identify candidates for generic implementations
    • Estimated effort: 6-8 hours
    • Benefits: Type-safe code reuse

Implementation Checklist

  • Review findings and prioritize refactoring tasks
  • Create detailed refactoring plan for Priority 1 items
  • Implement outlier function moves
  • Consolidate duplicate functions
  • Update tests to reflect changes
  • Verify no functionality broken
  • Consider Priority 2 and 3 items for future work

Analysis Metadata

  • Total Go Files Analyzed: [count]
  • Total Functions Cataloged: [count]
  • Function Clusters Identified: [count]
  • Outliers Found: [count]
  • Duplicates Detected: [count]
  • Detection Method: Serena semantic code analysis + naming pattern analysis
  • Analysis Date: [timestamp]

## Operational Guidelines

### Security
- Never execute untrusted code
- Only use read-only analysis tools
- Do not modify files during analysis (read-only mode)

### Efficiency
- Use Serena's semantic analysis capabilities effectively
- Cache Serena results in the memory folder
- Balance thoroughness with timeout constraints
- Focus on meaningful patterns, not trivial similarities

### Accuracy
- Verify findings before reporting
- Distinguish between acceptable duplication and problematic duplication
- Consider Go idioms and best practices
- Provide specific, actionable recommendations

### Issue Creation
- Only create an issue if significant findings are discovered
- Include sufficient detail for developers to understand and act
- Provide concrete examples with file paths and function signatures
- Suggest practical refactoring approaches
- Focus on high-impact improvements

## Analysis Focus Areas

### High-Value Analysis
1. **Function organization by file**: Does each file have a clear, single purpose?
2. **Function naming patterns**: Are similar functions grouped together?
3. **Code duplication**: Are there functions that should be consolidated?
4. **Utility scatter**: Are helper functions properly centralized?

### What to Report
- Functions clearly in the wrong file (e.g., network functions in parser file)
- Duplicate implementations of the same functionality
- Scattered helper functions that should be centralized
- Opportunities for improved code organization

### What to Skip
- Minor naming inconsistencies
- Single-occurrence patterns
- Language-specific idioms (constructors, standard patterns)
- Test files (already excluded)
- Trivial helper functions (<5 lines)

## Serena Tool Usage Guide

### Project Activation

Tool: activate_project Args: { "path": "${{ github.workspace }}" }


### Symbol Overview

Tool: get_symbols_overview Args: { "file_path": "internal/config/validation.go" }


### Find Similar Symbols

Tool: find_symbol Args: { "symbol_name": "validateConfig", "workspace": "${{ github.workspace }}" }


### Search for Patterns

Tool: search_for_pattern Args: { "pattern": "func.*Config.*error", "workspace": "${{ github.workspace }}" }


### Find References

Tool: find_referencing_symbols Args: { "symbol_name": "LoadConfig", "file_path": "internal/config/config.go" }


### Read File Content

Tool: read_file Args: { "file_path": "internal/config/config.go" }


## Success Criteria

This analysis is successful when:
1. βœ… All non-test Go files in internal/ are analyzed
2. βœ… Function names and signatures are collected and organized
3. βœ… Semantic clusters are identified based on naming and purpose
4. βœ… Outliers (functions in wrong files) are detected
5. βœ… Duplicates are identified using Serena's semantic analysis
6. βœ… Concrete refactoring recommendations are provided
7. βœ… A detailed issue is created with actionable findings

**Objective**: Improve code organization and reduce duplication by identifying refactoring opportunities through semantic function clustering and duplicate detection. Focus on high-impact, actionable findings that developers can implement.