-
Notifications
You must be signed in to change notification settings - Fork 199
Open
Labels
Description
Report
Backup job fails with the next error
2025-06-27 13:22:33.566 ERROR: failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout)
at ../../../../percona-xtradb-cluster-galera/gcomm/src/pc.cpp:connect():176
2025-06-27 13:22:33.566 ERROR: ../../../../percona-xtradb-cluster-galera/gcs/src/gcs_core.cpp:gcs_core_open():256: Failed to open backend connection: -110 (Connection timed out)
2025-06-27 13:22:34.566 INFO: gcomm: terminating thread
2025-06-27 13:22:34.566 INFO: gcomm: joining thread
2025-06-27 13:22:34.566 ERROR: ../../../../percona-xtradb-cluster-galera/gcs/src/gcs.cpp:gcs_open():1952: Failed to open channel 'mysql-cluster-pxc' at 'gcomm://mysql-cluster-pxc-4.mysql-cluster-pxc?gmcast.listen_addr=tcp://0.0.0.0:4567': -110 (Connection timed out)
2025-06-27 13:22:34.566 INFO: Shifting CLOSED -> DESTROYED (TO: 0)
2025-06-27 13:22:34.567 FATAL: Garbd exiting with error: Failed to open connection to group
at ../../../percona-xtradb-cluster-galera/garb/garb_gcs.cpp:Gcs():35
+ grep 'Will never receive state. Need to abort' /tmp/garbd.log
+ grep 'Donor is no longer in the cluster, interrupting script' /tmp/garbd.log
+ grep 'failed: Invalid argument' /tmp/garbd.log
+ '[' -f /tmp/backup-is-completed ']'
+ log ERROR 'Backup was finished unsuccessful'
+ exit 1
while cluster is in healthy and ready state
kubectl get pxc -n mysql-main
NAME ENDPOINT STATUS PXC PROXYSQL HAPROXY AGE
mysql-cluster 192.168.24.206 ready 5 3 3d19h
stas@SkyNet temp %
More about the problem
I have checked cluster state and it looks healty
MySQL [(none)]> SELECT
-> VARIABLE_NAME, VARIABLE_VALUE
-> FROM
-> performance_schema.global_status
-> WHERE
-> VARIABLE_NAME IN (
-> 'wsrep_cluster_status',
-> 'wsrep_local_state_comment',
-> 'wsrep_ready',
-> 'wsrep_connected',
-> 'wsrep_cluster_size'
-> );
+---------------------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+---------------------------+----------------+
| wsrep_cluster_size | 5 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_state_comment | Synced |
| wsrep_ready | ON |
+---------------------------+----------------+
5 rows in set (0.003 sec)
MySQL [(none)]>
To rule out any network issues, I run a debug pod in the same namespace.
And made the next steps:
pod-name can be resolved to ip
net-debug:~# nslookup mysql-cluster-pxc-4.mysql-cluster-pxc
;; Got recursion not available from 10.43.96.3
Server: 10.43.96.3
Address: 10.43.96.3#53
Name: mysql-cluster-pxc-4.mysql-cluster-pxc.mysql-main.svc.cluster.local
Address: 10.42.23.197
;; Got recursion not available from 10.43.96.3
net-debug:~#
pod exposes target port
net-debug:~# nc -zv mysql-cluster-pxc-4.mysql-cluster-pxc 4567
Connection to mysql-cluster-pxc-4.mysql-cluster-pxc (10.42.23.197) 4567 port [tcp/*] succeeded!
net-debug:~#
No any issues on pxc node side
2025-06-27T13:39:48.312124Z 31341 [Note] [MY-000000] [Galera] after_statement: success(31341,exec,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312160Z 31341 [Note] [MY-000000] [Galera] after_statement: enter(31341,exec,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312188Z 31341 [Note] [MY-000000] [Galera] after_statement_enter
server: 56ff5a7b-5331-11f0-b9f9-4ae7d6ef3935, client: 31341, state: exec, mode: local
trx_id: -1, seqno: -1, flags: 0
state: aborted, bfa_state: executing, error: success, status: 0
is_sr: 0, frags: 0, frags size: 0, unit: 0, size: 0, counter: 0, log_pos: 0, sr_rb: 0
own: 1 thread_id: 7f7f010ee640
2025-06-27T13:39:48.312213Z 31341 [Note] [MY-000000] [Galera] cleanup_enter
server: 56ff5a7b-5331-11f0-b9f9-4ae7d6ef3935, client: 31341, state: exec, mode: local
trx_id: -1, seqno: -1, flags: 0
state: aborted, bfa_state: executing, error: success, status: 0
is_sr: 0, frags: 0, frags size: 0, unit: 0, size: 0, counter: 0, log_pos: 0, sr_rb: 0
own: 1 thread_id: 7f7f010ee640
2025-06-27T13:39:48.312239Z 31341 [Note] [MY-000000] [Galera] cleanup_leave
server: 56ff5a7b-5331-11f0-b9f9-4ae7d6ef3935, client: 31341, state: exec, mode: local
trx_id: -1, seqno: -1, flags: 0
state: aborted, bfa_state: executing, error: success, status: 0
is_sr: 0, frags: 0, frags size: 0, unit: 0, size: 0, counter: 0, log_pos: 0, sr_rb: 0
own: 1 thread_id: 7f7f010ee640
2025-06-27T13:39:48.312265Z 31341 [Note] [MY-000000] [Galera] after_statement_leave
server: 56ff5a7b-5331-11f0-b9f9-4ae7d6ef3935, client: 31341, state: exec, mode: local
trx_id: -1, seqno: -1, flags: 0
state: aborted, bfa_state: executing, error: success, status: 0
is_sr: 0, frags: 0, frags size: 0, unit: 0, size: 0, counter: 0, log_pos: 0, sr_rb: 0
own: 1 thread_id: 7f7f010ee640
2025-06-27T13:39:48.312284Z 31341 [Note] [MY-000000] [Galera] after_statement: success(31341,exec,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312306Z 31341 [Note] [MY-000000] [Galera] after_command_before_result: enter(31341,exec,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312324Z 31341 [Note] [MY-000000] [Galera] after_command_before_result: leave(31341,result,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312433Z 31341 [Note] [MY-000000] [Galera] after_command_after_result_enter(31341,result,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312461Z 31341 [Note] [MY-000000] [Galera] after_command_after_result: leave(31341,idle,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312920Z 31341 [Note] [MY-000000] [Galera] before_command: enter(31341,idle,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312945Z 31341 [Note] [MY-000000] [Galera] before_command: success(31341,exec,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312968Z 31341 [Note] [MY-000000] [Galera] after_command_before_result: enter(31341,exec,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312987Z 31341 [Note] [MY-000000] [Galera] after_command_before_result: leave(31341,result,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.313011Z 31341 [Note] [MY-000000] [Galera] after_command_after_result_enter(31341,result,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.313033Z 31341 [Note] [MY-000000] [Galera] after_command_after_result: leave(31341,idle,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.313052Z 31341 [Note] [MY-000000] [Galera] close: enter(31341,idle,local,success,0,toi: -1,nbo: -1)
Steps to reproduce
- Deploy custom resource from release cr.yaml manifest
...
backup:
# allowParallel: true
image: percona/percona-xtradb-cluster-operator:1.17.0-pxc8.0-backup-pxb8.0.35
backoffLimit: 3
# activeDeadlineSeconds: 3600
# startingDeadlineSeconds: 300
# suspendedDeadlineSeconds: 1200
serviceAccountName: percona-xtradb-cluster-operator
# imagePullSecrets:
# - name: private-registry-credentials
storages:
minio:
type: s3
verifyTLS: true
s3:
bucket: percona-operator
region: us-east-1
endpointUrl: https://minio.mydomain.net
credentialsSecret: mysql-cluster-s3-credentials
resources:
requests:
memory: 1G
cpu: 600m
...
- deploy backup.yaml manifest
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterBackup
metadata:
namespace: mysql-main
finalizers:
- percona.com/delete-backup
name: test-backup
spec:
pxcCluster: mysql-cluster
storageName: minio
Versions
- Kubernetes - v1.24.17
- Operator - 1.17.0
- Database - 8.0.41-32.1
Anything else?
No response