garbd fails to connect to PXC cluster during backup job

### Report

Backup job fails with the next error
```bash
2025-06-27 13:22:33.566 ERROR: failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout)
        at ../../../../percona-xtradb-cluster-galera/gcomm/src/pc.cpp:connect():176
2025-06-27 13:22:33.566 ERROR: ../../../../percona-xtradb-cluster-galera/gcs/src/gcs_core.cpp:gcs_core_open():256: Failed to open backend connection: -110 (Connection timed out)
2025-06-27 13:22:34.566  INFO: gcomm: terminating thread
2025-06-27 13:22:34.566  INFO: gcomm: joining thread
2025-06-27 13:22:34.566 ERROR: ../../../../percona-xtradb-cluster-galera/gcs/src/gcs.cpp:gcs_open():1952: Failed to open channel 'mysql-cluster-pxc' at 'gcomm://mysql-cluster-pxc-4.mysql-cluster-pxc?gmcast.listen_addr=tcp://0.0.0.0:4567': -110 (Connection timed out)
2025-06-27 13:22:34.566  INFO: Shifting CLOSED -> DESTROYED (TO: 0)
2025-06-27 13:22:34.567 FATAL: Garbd exiting with error: Failed to open connection to group
        at ../../../percona-xtradb-cluster-galera/garb/garb_gcs.cpp:Gcs():35
+ grep 'Will never receive state. Need to abort' /tmp/garbd.log
+ grep 'Donor is no longer in the cluster, interrupting script' /tmp/garbd.log
+ grep 'failed: Invalid argument' /tmp/garbd.log
+ '[' -f /tmp/backup-is-completed ']'
+ log ERROR 'Backup was finished unsuccessful'
+ exit 1
```
while cluster is in healthy and ready state

```bash
  kubectl get pxc -n mysql-main    
NAME            ENDPOINT         STATUS   PXC   PROXYSQL   HAPROXY   AGE
mysql-cluster   192.168.24.206   ready    5                3         3d19h
stas@SkyNet temp % 
```

### More about the problem

I have checked cluster state and it looks healty
```bash
MySQL [(none)]> SELECT 
    ->   VARIABLE_NAME, VARIABLE_VALUE 
    -> FROM 
    ->   performance_schema.global_status 
    -> WHERE 
    ->   VARIABLE_NAME IN (
    ->     'wsrep_cluster_status', 
    ->     'wsrep_local_state_comment', 
    ->     'wsrep_ready', 
    ->     'wsrep_connected', 
    ->     'wsrep_cluster_size'
    ->   );
+---------------------------+----------------+
| VARIABLE_NAME             | VARIABLE_VALUE |
+---------------------------+----------------+
| wsrep_cluster_size        | 5              |
| wsrep_cluster_status      | Primary        |
| wsrep_connected           | ON             |
| wsrep_local_state_comment | Synced         |
| wsrep_ready               | ON             |
+---------------------------+----------------+
5 rows in set (0.003 sec)

MySQL [(none)]>
```
To rule out any network issues, I run a debug pod in the same namespace.
And made the next steps:
 **pod-name can be resolved to ip**
```bash
net-debug:~# nslookup mysql-cluster-pxc-4.mysql-cluster-pxc
;; Got recursion not available from 10.43.96.3
Server:         10.43.96.3
Address:        10.43.96.3#53

Name:   mysql-cluster-pxc-4.mysql-cluster-pxc.mysql-main.svc.cluster.local
Address: 10.42.23.197
;; Got recursion not available from 10.43.96.3

net-debug:~# 
```
**pod exposes target port**
```bash
net-debug:~# nc -zv mysql-cluster-pxc-4.mysql-cluster-pxc 4567
Connection to mysql-cluster-pxc-4.mysql-cluster-pxc (10.42.23.197) 4567 port [tcp/*] succeeded!
net-debug:~# 
```
**No any issues on pxc node side**
```bash
2025-06-27T13:39:48.312124Z 31341 [Note] [MY-000000] [Galera] after_statement: success(31341,exec,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312160Z 31341 [Note] [MY-000000] [Galera] after_statement: enter(31341,exec,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312188Z 31341 [Note] [MY-000000] [Galera] after_statement_enter
    server: 56ff5a7b-5331-11f0-b9f9-4ae7d6ef3935, client: 31341, state: exec, mode: local
    trx_id: -1, seqno: -1, flags: 0
    state: aborted, bfa_state: executing, error: success, status: 0
    is_sr: 0, frags: 0, frags size: 0, unit: 0, size: 0, counter: 0, log_pos: 0, sr_rb: 0
    own: 1 thread_id: 7f7f010ee640
2025-06-27T13:39:48.312213Z 31341 [Note] [MY-000000] [Galera] cleanup_enter
    server: 56ff5a7b-5331-11f0-b9f9-4ae7d6ef3935, client: 31341, state: exec, mode: local
    trx_id: -1, seqno: -1, flags: 0
    state: aborted, bfa_state: executing, error: success, status: 0
    is_sr: 0, frags: 0, frags size: 0, unit: 0, size: 0, counter: 0, log_pos: 0, sr_rb: 0
    own: 1 thread_id: 7f7f010ee640
2025-06-27T13:39:48.312239Z 31341 [Note] [MY-000000] [Galera] cleanup_leave
    server: 56ff5a7b-5331-11f0-b9f9-4ae7d6ef3935, client: 31341, state: exec, mode: local
    trx_id: -1, seqno: -1, flags: 0
    state: aborted, bfa_state: executing, error: success, status: 0
    is_sr: 0, frags: 0, frags size: 0, unit: 0, size: 0, counter: 0, log_pos: 0, sr_rb: 0
    own: 1 thread_id: 7f7f010ee640
2025-06-27T13:39:48.312265Z 31341 [Note] [MY-000000] [Galera] after_statement_leave
    server: 56ff5a7b-5331-11f0-b9f9-4ae7d6ef3935, client: 31341, state: exec, mode: local
    trx_id: -1, seqno: -1, flags: 0
    state: aborted, bfa_state: executing, error: success, status: 0
    is_sr: 0, frags: 0, frags size: 0, unit: 0, size: 0, counter: 0, log_pos: 0, sr_rb: 0
    own: 1 thread_id: 7f7f010ee640
2025-06-27T13:39:48.312284Z 31341 [Note] [MY-000000] [Galera] after_statement: success(31341,exec,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312306Z 31341 [Note] [MY-000000] [Galera] after_command_before_result: enter(31341,exec,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312324Z 31341 [Note] [MY-000000] [Galera] after_command_before_result: leave(31341,result,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312433Z 31341 [Note] [MY-000000] [Galera] after_command_after_result_enter(31341,result,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312461Z 31341 [Note] [MY-000000] [Galera] after_command_after_result: leave(31341,idle,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312920Z 31341 [Note] [MY-000000] [Galera] before_command: enter(31341,idle,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312945Z 31341 [Note] [MY-000000] [Galera] before_command: success(31341,exec,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312968Z 31341 [Note] [MY-000000] [Galera] after_command_before_result: enter(31341,exec,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.312987Z 31341 [Note] [MY-000000] [Galera] after_command_before_result: leave(31341,result,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.313011Z 31341 [Note] [MY-000000] [Galera] after_command_after_result_enter(31341,result,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.313033Z 31341 [Note] [MY-000000] [Galera] after_command_after_result: leave(31341,idle,local,success,0,toi: -1,nbo: -1)
2025-06-27T13:39:48.313052Z 31341 [Note] [MY-000000] [Galera] close: enter(31341,idle,local,success,0,toi: -1,nbo: -1)
```


### Steps to reproduce

1. Deploy custom resource from release cr.yaml manifest
```yaml
...
backup:
#    allowParallel: true
    image: percona/percona-xtradb-cluster-operator:1.17.0-pxc8.0-backup-pxb8.0.35
    backoffLimit: 3
#    activeDeadlineSeconds: 3600
#    startingDeadlineSeconds: 300
#    suspendedDeadlineSeconds: 1200
    serviceAccountName: percona-xtradb-cluster-operator
#    imagePullSecrets:
#      - name: private-registry-credentials
   
    storages:
      minio:
        type: s3
        verifyTLS: true
        s3:
          bucket: percona-operator
          region: us-east-1
          endpointUrl: https://minio.mydomain.net
          credentialsSecret: mysql-cluster-s3-credentials
        resources:
          requests:
            memory: 1G
            cpu: 600m
...
```
2. deploy backup.yaml manifest
```yaml
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterBackup
metadata:
  namespace: mysql-main
  finalizers:
    - percona.com/delete-backup
  name: test-backup
spec:
  pxcCluster: mysql-cluster
  storageName: minio
```

### Versions

1. Kubernetes - v1.24.17
2. Operator - 1.17.0
3. Database - 8.0.41-32.1


### Anything else?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

garbd fails to connect to PXC cluster during backup job #2105

Report

More about the problem

Steps to reproduce

Versions

Anything else?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

garbd fails to connect to PXC cluster during backup job #2105

Description

Report

More about the problem

Steps to reproduce

Versions

Anything else?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions