Skip to content

wallet+db: handle SQL backend errors#1200

Draft
yyforyongyu wants to merge 9 commits intobtcsuite:adr-0006-testsfrom
yyforyongyu:handle-sql-error
Draft

wallet+db: handle SQL backend errors#1200
yyforyongyu wants to merge 9 commits intobtcsuite:adr-0006-testsfrom
yyforyongyu:handle-sql-error

Conversation

@yyforyongyu
Copy link
Copy Markdown
Collaborator

@yyforyongyu yyforyongyu commented Mar 30, 2026

Summary

  • introduce the shared SQL error framework in wallet/internal/db/err, including the shared taxonomy, wrapper, normalization helpers, stats collector, and package README
  • add backend-specific SQL error mapper packages in wallet/internal/db/postgres and wallet/internal/db/sqlite, with same-package tests and private helper details
  • add the shared wallet/internal/db/runtime helper framework for retry, unhealthy-store checks, transaction execution, and ambiguous commit handling
  • keep this PR focused on framework code only; production store wiring and runtime integration remain follow-up work once store code moves into the backend packages

Follow-up Plan

  • move store code into wallet/internal/db/postgres and wallet/internal/db/sqlite
  • integrate production normalization, stats recording, and runtime/store reaction logic in their final package locations
  • wire the shared runtime helper package on top of the backend-owned store implementations

Testing

  • make lint
  • GOWORK=off go test ./wallet/internal/db/... ./wallet
  • GOWORK=off go test ./wallet/internal/db/runtime
  • GOWORK=off go test -run '^$' -bench . ./wallet/internal/db/runtime

@yyforyongyu yyforyongyu added this to the Introduce SQL store milestone Mar 30, 2026
@yyforyongyu yyforyongyu self-assigned this Mar 30, 2026
@saubyk saubyk added this to lnd v0.22 Mar 30, 2026
@saubyk saubyk moved this from Backlog to In progress in lnd v0.22 Mar 30, 2026
@yyforyongyu yyforyongyu force-pushed the handle-sql-error branch 2 times, most recently from d666be3 to 46801d8 Compare March 30, 2026 17:25
@yyforyongyu yyforyongyu changed the base branch from sql-wallet to adr-0006-tests March 30, 2026 19:21
Copy link
Copy Markdown
Collaborator Author

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅ (gpt-5.3-codex)

Copy link
Copy Markdown
Collaborator Author

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🛡️

Error classification is correct — verified all PG SQLSTATE codes and SQLite result codes against official docs. Classification ordering in classifySQLError is sound (context errors first, then backend-specific, then generic connection, then permanent fallback). Stats counters are atomic. Error.Is correctly matches both class and reason sentinels.

Two minor observations (not bugs):

  1. Stats recorded during constructor error paths (PingContext/ApplyMigrations failure) are discarded since the store is never returned.
  2. A few common PG constraint codes (23502, 23P01) are not explicitly mapped but fall through correctly to permanent/unknown.

(claude-opus-4-6)

@yyforyongyu
Copy link
Copy Markdown
Collaborator Author

I've addressed both observations in the latest commit:

  1. Mapped 23502 (not_null_violation) and 23P01 (exclusion_violation) to ErrorClassPermanent and ErrorReasonConstraint.
  2. Modified the DB constructors (NewPostgresStore and NewSqliteStore) to return the instantiated store alongside the initialization error during PingContext/ApplyMigrations failures, allowing callers to inspect store.DBStatsSnapshot(). Tested via db_connectors_test.go.

Copy link
Copy Markdown
Collaborator Author

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎲

The PR now includes:

  • Detailed docs for error sentinels explaining when they occur
  • Added and private sentinel types for type checking
  • Switched to a switch-based approach derived from
  • Clear documentation on pg and sqlite error specs
  • Backend resolution happens before classification in

Good work addressing all the feedback! ()

@yyforyongyu
Copy link
Copy Markdown
Collaborator Author

LGTM 🚀 (gemini-3.1-pro-preview)

Copy link
Copy Markdown
Collaborator Author

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎲

The PR now includes:

  • Detailed docs for error sentinels explaining when they occur
  • Added errorClassSentinel and errorReasonSentinel private sentinel types for type checking
  • Switched reasonRecorders to a switch-based approach derived from errorReasonSentinel
  • Clear documentation on pg and sqlite error specs
  • Backend resolution happens before classification in classifySQLError

Good work addressing all the feedback! (glm5)

Add the shared SQL error taxonomy, wrapper, and normalization
helpers in the new db/err package. The API now uses shorter shared
Class and Reason names so call sites avoid the earlier stutter.

This also moves the shared error tests into the same package and adds
docs for the remaining internal helpers.
Add the PostgreSQL SQLSTATE mapper in the new db/postgres
package. The mapper converts PostgreSQL driver and transport
failures into the shared SQL error model while keeping
backend-specific helper details private to the package.

This also moves the backend-specific mapper tests into the same
package.
Add the SQLite result-code mapper in the new db/sqlite package.
The mapper converts SQLite driver errors into the shared SQL
error model while keeping backend-specific helper details
private.

This also moves the backend-specific mapper tests into the same
package.
Add the shared SQL error stats collector and snapshot types to
the db/err package. This keeps classification-driven metrics
with the shared error framework instead of wiring them through
top-level db production code first.

This also adds focused unit coverage for the shared stats
behavior.
Document the purpose of the shared db/err package and the
intended boundaries between shared error code,
backend-specific mapping, and runtime/store reaction policy.
This keeps the staged package split explicit while production
integration stays in follow-up work.
Add the shared runtime helper package for SQL backends. The
new helpers cover read retries, unhealthy-store checks,
transaction execution, and ambiguous commit wrapping without
wiring into backend store code yet.

This keeps the runtime framework reviewable while store
integration lands in follow-up work.
Add focused tests and small benchmarks for the shared runtime
helper package. The coverage exercises retry,
unhealthy-store, begin, and ambiguous-commit paths while the
benchmarks cover the steady-state and transient-retry read
flows.
Rename the shared error package declaration to err, keep dberr as\nthe import alias at call sites, and simplify the shared naming to\nClass and Reason throughout the package. The reason stringer now uses\none explicit switch and the same-package tests follow the renamed\npackage cleanly.\n\nThis also fills in the remaining mapper and helper docs so the shared\nerror framework matches the repo's Go documentation rules.
Rename the shared runtime hook interfaces and ambiguous-commit\nerror types to make their purpose clearer. Read retries now take\nexplicit caller-provided settings so future wallet or backend config\ncan own the retry policy instead of the shared helper package.\n\nThis also adds the missing helper docs and expands the tiny test\nhelpers to match the local multiline style.
Copy link
Copy Markdown
Collaborator Author

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎲

All review comments have been addressed. The PR now has:

  • Clean package structure with shared err package and backend-specific postgres/sqlite packages
  • Well-documented enums with Valid() methods for backend, class, and reason
  • Clear Reason.Class() method deriving policy from reason
  • Proper normalization helper that stays side-effect free
  • README documenting package ownership boundaries

Good work on the iterations! (glm5)

@yyforyongyu
Copy link
Copy Markdown
Collaborator Author

LGTM 🚀 (gemini-3.1-pro-preview)

Copy link
Copy Markdown
Collaborator Author

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅ (gpt-5.3-codex)

Copy link
Copy Markdown
Collaborator Author

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🛡️

All 21 PG SQLSTATE codes and 13 SQLite result codes verified against official docs — correctly classified. Reason.Class() range partitioning is correct. Normalize() ordering is sound (context first, existing wrappers preserved, backend-specific before generic transport, permanent fallback). pgconn.PgError is matched before SafeToRetry/Timeout so SQLSTATE codes are never shadowed. Stats counters atomic throughout. Retry loop boundaries correct with proper overflow handling in retryDelay. All 13 review threads resolved.

(claude-opus-4-6)

@yyforyongyu
Copy link
Copy Markdown
Collaborator Author

LGTM 🛡️ (qwen3.5-397b-a17b)

All review threads resolved. The PR now includes:

  • Comprehensive error classification with detailed docs for each sentinel
  • Private sentinel types (errorClassSentinel, errorReasonSentinel) for type checking
  • Switch-based error classification with proper backend resolution
  • Complete SQLSTATE coverage for constraint violations (23502, 23503, 23504, 23505, 23514, 23P01)
  • Stats API exposing startup failures via DBStats()
  • Proper comment formatting throughout

Error classification logic is sound and well-documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants