Skip to content

Latest commit

 

History

History
56 lines (37 loc) · 2.76 KB

File metadata and controls

56 lines (37 loc) · 2.76 KB

AGENTS.md

This file provides guidance to AI agents when working with code in this repository.

Commands

# Build
go build .

# Run
go run . [flags] grammar.y

# Test (no test suite exists; validate by generating a grammar)
go vet ./...

There is no Makefile; standard Go tooling is used throughout.

Architecture

This tool (package main) reads a YACC grammar (.y file) and writes a Go parser. The main logic lives in goyacc.go (~3700 lines); union layout inference is in unionsize.go (uses go/packages to inspect the target package's types at generation time).

The upstream source is cmd/goyacc in the Go x/tools repository (GitHub mirror). This fork adds the enhancements described below.

Processing pipeline

  1. Lexinggettok() / getword() tokenize the .y input
  2. Grammar parsingsetup() reads productions, types, and directives into global arrays
  3. State generationstagen() builds the LALR(1) automaton (states, items, lookahead sets)
  4. Table outputoutput() / go2out() write the action/goto tables to the output file
  5. Code emissioncppcode(), cpyact(), and related functions copy user code sections verbatim into the output

Key data structures

  • Pitem / Item — a production rule with a dot position and lookahead set
  • Symb — a grammar symbol (terminal or nonterminal)
  • Lkset — a bitset representing lookahead tokens
  • Row — one row of the action table (actions + default)
  • Error — a custom error message keyed by (state, token)

Enhancements over standard goyacc

Discriminated unsafe.Pointer union (%union): yySymType holds a data [N]uintptr array (sized to the largest member) plus a ptrs [M]unsafe.Pointer GC-keepalive array. All member types are stored/read via uintptr casts, eliminating interface boxing. Array sizes and pointer-word offsets (for the GC-keepalive array) are inferred automatically at generation time via go/packages (inferUnionLayout in unionsize.go). Typed getter (<member>()) and setter (set<member>()) methods are generated for each member.

Custom error messages: // error: "message" comments in grammar rules are collected into a lookup table keyed by (state, token) pair and emitted into the generated parser.

Generated output

The emitted file is valid but unformatted Go. Callers should post-process with goimports and/or gofumpt.

Generated parsers expose:

  • yySymType — the semantic value type on the parser stack
  • yyLexer interface — Lex(lval *yySymType) int
  • yyParser interface — Parse(yylex yyLexer) int
  • yyParse(yylex yyLexer) int — convenience entry point