Skip to content

Cluster reconciliation breaks if a cluster is unhealthy #2100

@gwvandesteeg

Description

@gwvandesteeg

Report

When you have multiple database instances created using the operator, and one of the nodes in one of the clusters is crash-looping for any reason it breaks the entire reconciliation process including that for the other databases.

Each of the defined clusters needs to be handled individually and reconciled even when errors occur in the other cluster definitions.

More about the problem

This is what the log entry looks like

2025-06-19T13:22:58.086Z        INFO    update PXC version (fetched from db)    {"controller": "pxc-controller", "namespace": "default", "name": "mysql-pxc-db", "reconcileID": "4f80c7dd-c6f2-4b30-8cb8-8933c05a787e", "new version": "8.0.41-32.1"}
2025-06-19T13:22:58.816Z        ERROR   failed to create db instance    {"controller": "pxc-controller", "namespace": "default", "name": "voipmonitor-db-pxc-db", "reconcileID": "e5b9eb9b-ba85-4ff3-bf3d-db4537713006", "error": "dial tcp 10.1.0.170:33062: connect: connection refused"}
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc.(*ReconcilePerconaXtraDBCluster).mysqlVersion
        /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc/version.go:427
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc.(*ReconcilePerconaXtraDBCluster).handleMonitorUser
        /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc/users.go:415
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc.(*ReconcilePerconaXtraDBCluster).updateUsers
        /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc/users.go:160
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc.(*ReconcilePerconaXtraDBCluster).reconcileUsers
        /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc/users.go:101
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc.(*ReconcilePerconaXtraDBCluster).Reconcile
        /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc/controller.go:362
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:334
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:294
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:255
2025-06-19T13:22:58.818Z        ERROR   failed to create db instance    {"controller": "pxc-controller", "namespace": "default", "name": "voipmonitor-db-pxc-db", "reconcileID": "e5b9eb9b

Steps to reproduce

  1. Have two clusters created (3 nodes each)
  2. Get one pod to start CrashLooping
  3. Make a change to the CRD for the non-crash-looping cluster

Versions

  1. Kubernetes Server Version: v1.31.7-eks-4096722
  2. Operator pxc-operator-1.17.0
  3. Database 8.0.41-32.1
2025-06-19T13:22:01.443Z        INFO    setup   Runs on {"platform": "kubernetes", "version": "v1.31.7-eks-4096722"}
2025-06-19T13:22:01.443Z        INFO    setup   Manager starting up     {"gitCommit": "864c5b6361ff477546b39f0695db5ee259c85269", "gitBranch": "release-1-17-0", "buildTime": "2025-04-07T14:08:55Z", "goVersion": "go1.23.8", "os": "linux", "arch": "amd64"}

and

  pxc:
    Container ID:  containerd://e2d515fef2a1b40994107d77a15fcba518e4cd79e15804379a249aee2bd31c3d
    Image:         percona/percona-xtradb-cluster:8.0.41-32.1

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions