vreplication: preserve MySQL JSON types from binlog#20120
vreplication: preserve MySQL JSON types from binlog#20120herbenderbler wants to merge 2 commits into
Conversation
Use MarshalSQLTo on the vstreamer binlog path instead of MarshalTo so DATE, TIME, DATETIME, and other opaque JSON types are not flattened to text. Teach the replicator to pass through preserialized SQL expressions via JSONSQLValue and AppendJSONSQL. Fixes vitessio#19880 Signed-off-by: John Claus <johnclaus@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Use package-level prefix slices, check JSON_OBJECT first, trim whitespace once for detection and passthrough, and fast-reject standard JSON text roots. Signed-off-by: John Claus <johnclaus@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
There was a problem hiding this comment.
Pull request overview
This PR fixes VReplication's loss of MySQL JSON internal type tags (DATE, TIME, DATETIME, DECIMAL, BIT, BLOB) by preserving the typed SQL form of JSON values end-to-end. On the binlog read path, CellValue now uses MarshalSQLTo (which emits JSON_OBJECT(...)/JSON_ARRAY(...)/CAST(... as JSON) expressions) instead of MarshalTo (which produced plain JSON text). The replicator's apply path is taught to recognize and pass through these pre-serialized SQL expressions via new helpers JSONSQLValue and AppendJSONSQL, falling back to MarshalSQLValue/AppendMarshalSQL for legacy text JSON (e.g., from initial table copy).
Changes:
- Switch binlog cell decoding to emit typed SQL JSON expressions (
MarshalSQLTo). - Add
IsPreserializedJSONSQL,JSONSQLValue, andAppendJSONSQLhelpers that detect SQL-form JSON by top-level prefix (JSON_OBJECT(/JSON_ARRAY(/CAST() and bypass re-encoding. - Route the four VReplication apply call sites through the new helpers and add regression tests at all layers.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| go/mysql/binlog/rbr.go | Use MarshalSQLTo so JSON cell values carry SQL with type info. |
| go/mysql/binlog/rbr_test.go | Update expectations to SQL form; add DATE case. |
| go/mysql/json/marshal.go | New prefix-based detection plus JSONSQLValue / AppendJSONSQL. |
| go/mysql/json/marshal_test.go | Unit tests for new helpers and detection heuristic. |
| go/vt/vttablet/tabletmanager/vreplication/replicator_plan.go | Use new helpers at all JSON SQL-binding sites. |
| go/vt/vttablet/tabletmanager/vreplication/replicator_plan_test.go | Regression tests proving typed SQL is preserved. |
| prefixJSONObject = []byte("JSON_OBJECT(") | ||
| prefixJSONArray = []byte("JSON_ARRAY(") | ||
| prefixCAST = []byte("CAST(") |
There was a problem hiding this comment.
This feels very hacky to me. 🤔
There was a problem hiding this comment.
This feels very hacky to me. 🤔
The prefix check is a workaround for type erasure on the VReplication wire path. binlog CellValue already returns sqltypes.Expression for typed JSON, but RowToProto3 only ships bytes and the target reconstructs the cell as Type_JSON. That being said, I can’t tell SQL expressions from text JSON without either a heuristic or extra metadata (similar to json_partial_values for partial updates). The prefixes match exactly what MarshalSQLTo emits. text JSON is rejected via {/[/".
I'm happy to switch to an explicit RowChange bitmap in a follow-up (or something else) if you’d prefer that over sniffing.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #20120 +/- ##
===========================================
+ Coverage 69.67% 78.05% +8.38%
===========================================
Files 1614 37 -1577
Lines 216793 7623 -209170
===========================================
- Hits 151044 5950 -145094
+ Misses 65749 1673 -64076
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Description
VReplication was losing MySQL JSON internal types (DATE, TIME, DATETIME, etc.) on the binlog path: binary JSON was marshaled to plain text JSON before gRPC, and the replicator only re-encoded standard JSON literals on apply.
This PR keeps typed JSON through the pipeline by using
MarshalSQLToin the binlog reader and passing preserialized JSON SQL throughreplicator_planinstead of round-tripping viaMarshalSQLValue.Fixes #19880
Related Issue(s)
Fixes #19880
Checklist
Deployment Notes
User-visible correctness fix for VReplication JSON columns with typed values (e.g.
JSON_TYPEreturnsDATEinstead ofSTRINGafter replicate). No schema or flag changes.