Skip to content

PMM-client cannot handle replicas: duplicate metrics and failed custom queries #1259

@AndrinGautschi

Description

@AndrinGautschi

Report

After setting up a postgres cluster with percona-postgres-operator and enabling pmm-client on it, the pmm-client sidecars that are attached to replica postgres instances spam the following errors:

time="2025-08-25T15:29:02.407+00:00" level=info msg="ts=2025-08-25T15:29:02.407Z caller=percona_exporter.go:86 msg=\"Excluded databases\" databases=\"[template0 template1 cloudsqladmin pmm-managed-dev azure_maintenance rdsadmin]\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.094+00:00" level=info msg="ts=2025-08-25T15:29:07.094Z caller=percona_exporter.go:86 msg=\"Excluded databases\" databases=\"[template0 template1 cloudsqladmin pmm-managed-dev azure_maintenance rdsadmin]\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.114+00:00" level=error msg="ts=2025-08-25T15:29:07.113Z caller=namespace.go:236 level=error err=\"Error running query on database \\\"127.0.0.1:5432\\\": pg_custom_replication_wal pq: recovery is in progress\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.117+00:00" level=error msg="ts=2025-08-25T15:29:07.117Z caller=namespace.go:236 level=error err=\"Error running query on database \\\"127.0.0.1:5432\\\": pg_custom_replication_wal pq: recovery is in progress\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.119+00:00" level=error msg="ts=2025-08-25T15:29:07.119Z caller=server.go:130 level=error msg=\"NAMESPACE ERRORS FOUND\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.119+00:00" level=error msg="ts=2025-08-25T15:29:07.119Z caller=server.go:132 level=error namespace=pg_custom_replication_wal msg=\"Error running query on database \\\"127.0.0.1:5432\\\": pg_custom_replication_wal pq: recovery is in progress\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.119+00:00" level=error msg="ts=2025-08-25T15:29:07.119Z caller=postgres_exporter.go:770 level=error err=\"queryNamespaceMappings returned 1 errors\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.124+00:00" level=error msg="ts=2025-08-25T15:29:07.124Z caller=server.go:130 level=error msg=\"NAMESPACE ERRORS FOUND\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.124+00:00" level=error msg="ts=2025-08-25T15:29:07.124Z caller=server.go:132 level=error namespace=pg_custom_replication_wal msg=\"Error running query on database \\\"127.0.0.1:5432\\\": pg_custom_replication_wal pq: recovery is in progress\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.124+00:00" level=error msg="ts=2025-08-25T15:29:07.124Z caller=postgres_exporter.go:770 level=error err=\"queryNamespaceMappings returned 1 errors\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.125+00:00" level=info msg="ts=2025-08-25T15:29:07.125Z caller=log.go:245 msg=\"handlererror gathering metrics: 11 error(s) occurred:\\n* collected metric \\\"pg_custom_database_size_custom_bytes\\\" { label:{name:\\\"datname\\\"  value:\\\"template1\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  gauge:{value:7.930383e+06}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_database_size_custom_bytes\\\" { label:{name:\\\"datname\\\"  value:\\\"template0\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  gauge:{value:7.725583e+06}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_database_size_custom_bytes\\\" { label:{name:\\\"datname\\\"  value:\\\"postgres\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  gauge:{value:8.113299e+06}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_database_size_custom_bytes\\\" { label:{name:\\\"datname\\\"  value:\\\"iamapp\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  gauge:{value:8.039571e+06}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_wal_receiver_written_lsn_bytes\\\" { label:{name:\\\"primary_host\\\"  value:\\\"iam-postgres-cluster-instance1-bn7t-0.iam-postgres-cluster-pods\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"status\\\"  value:\\\"streaming\\\"}  gauge:{value:3.01989888e+08}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_wal_receiver_latest_end_lsn_bytes\\\" { label:{name:\\\"primary_host\\\"  value:\\\"iam-postgres-cluster-instance1-bn7t-0.iam-postgres-cluster-pods\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"status\\\"  value:\\\"streaming\\\"}  gauge:{value:3.01989888e+08}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_wal_receiver_lag_bytes\\\" { label:{name:\\\"primary_host\\\"  value:\\\"iam-postgres-cluster-instance1-bn7t-0.iam-postgres-cluster-pods\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"status\\\"  value:\\\"streaming\\\"}  gauge:{value:0}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_wal_receiver_latest_end_time_seconds\\\" { label:{name:\\\"primary_host\\\"  value:\\\"iam-postgres-cluster-instance1-bn7t-0.iam-postgres-cluster-pods\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"status\\\"  value:\\\"streaming\\\"}  gauge:{value:1.756135467307919e+09}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_wal_receiver_lag_time_seconds\\\" { label:{name:\\\"primary_host\\\"  value:\\\"iam-postgres-cluster-instance1-bn7t-0.iam-postgres-cluster-pods\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"status\\\"  value:\\\"streaming\\\"}  gauge:{value:279.816133}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_wal_receiver_last_msg_send_time_seconds\\\" { label:{name:\\\"primary_host\\\"  value:\\\"iam-postgres-cluster-instance1-bn7t-0.iam-postgres-cluster-pods\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"status\\\"  value:\\\"streaming\\\"}  gauge:{value:1.7561357382459e+09}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_wal_receiver_last_msg_receipt_time_seconds\\\" { label:{name:\\\"primary_host\\\"  value:\\\"iam-postgres-cluster-instance1-bn7t-0.iam-postgres-cluster-pods\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"status\\\"  value:\\\"streaming\\\"}  gauge:{value:1.756135738258575e+09}} was collected before with the same name and label values\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter

Meanwhile, the sidecar pmm-client running next to the primary instance reports:

time="2025-08-25T15:31:53.632+00:00" level=info msg="ts=2025-08-25T15:31:53.631Z caller=percona_exporter.go:86 msg=\"Excluded databases\" databases=\"[template0 template1 cloudsqladmin pmm-managed-dev azure_maintenance rdsadmin]\"" agentID=de1550a3-b421-4f31-a35e-9400e55448fe component=agent-process type=postgres_exporter
time="2025-08-25T15:31:53.658+00:00" level=info msg="ts=2025-08-25T15:31:53.658Z caller=log.go:245 msg=\"handlererror gathering metrics: 22 error(s) occurred:\\n* collected metric \\\"pg_custom_stat_activity_walsender_pid\\\" { label:{name:\\\"application_name\\\"  value:\\\"iam-postgres-cluster-instance1-2lg8-0\\\"}  label:{name:\\\"client_addr\\\"  value:\\\"10.42.5.19\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"state\\\"  value:\\\"active\\\"}  label:{name:\\\"usename\\\"  value:\\\"_crunchyrepl\\\"}  gauge:{value:752}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_activity_walsender_backend_start_unix\\\" { label:{name:\\\"application_name\\\"  value:\\\"iam-postgres-cluster-instance1-2lg8-0\\\"}  label:{name:\\\"client_addr\\\"  value:\\\"10.42.5.19\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"state\\\"  value:\\\"active\\\"}  label:{name:\\\"usename\\\"  value:\\\"_crunchyrepl\\\"}  gauge:{value:1.756135133033277e+09}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_activity_walsender_pid\\\" { label:{name:\\\"application_name\\\"  value:\\\"iam-postgres-cluster-instance1-4l48-0\\\"}  label:{name:\\\"client_addr\\\"  value:\\\"10.42.4.19\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"state\\\"  value:\\\"active\\\"}  label:{name:\\\"usename\\\"  value:\\\"_crunchyrepl\\\"}  gauge:{value:374}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_activity_walsender_backend_start_unix\\\" { label:{name:\\\"application_name\\\"  value:\\\"iam-postgres-cluster-instance1-4l48-0\\\"}  label:{name:\\\"client_addr\\\"  value:\\\"10.42.4.19\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"state\\\"  value:\\\"active\\\"}  label:{name:\\\"usename\\\"  value:\\\"_crunchyrepl\\\"}  gauge:{value:1.756135109421014e+09}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_database_size_custom_bytes\\\" { label:{name:\\\"datname\\\"  value:\\\"template1\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  gauge:{value:8.088723e+06}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_database_size_custom_bytes\\\" { label:{name:\\\"datname\\\"  value:\\\"template0\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  gauge:{value:7.725583e+06}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_database_size_custom_bytes\\\" { label:{name:\\\"datname\\\"  value:\\\"postgres\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  gauge:{value:8.113299e+06}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_database_size_custom_bytes\\\" { label:{name:\\\"datname\\\"  value:\\\"iamapp\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  gauge:{value:8.039571e+06}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_flush_lsn_bytes\\\" { label:{name:\\\"application_name\\\"  value:\\\"iam-postgres-cluster-instance1-4l48-0\\\"}  label:{name:\\\"client_addr\\\"  value:\\\"10.42.4.19\\\"}  label:{name:\\\"pid\\\"  value:\\\"374\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"state\\\"  value:\\\"streaming\\\"}  label:{name:\\\"sync_state\\\"  value:\\\"async\\\"}  label:{name:\\\"usename\\\"  value:\\\"_crunchyrepl\\\"}  gauge:{value:3.01989888e+08}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_write_lsn_bytes\\\" { label:{name:\\\"application_name\\\"  value:\\\"iam-postgres-cluster-instance1-4l48-0\\\"}  label:{name:\\\"client_addr\\\"  value:\\\"10.42.4.19\\\"}  label:{name:\\\"pid\\\"  value:\\\"374\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"state\\\"  value:\\\"streaming\\\"}  label:{name:\\\"sync_state\\\"  value:\\\"async\\\"}  label:{name:\\\"usename\\\"  value:\\\"_crunchyrepl\\\"}  gauge:{value:3.01989888e+08}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_replay_lsn_bytes\\\" { label:{name:\\\"application_name\\\"  value:\\\"iam-postgres-cluster-instance1-4l48-0\\\"}  label:{name:\\\"client_addr\\\"  value:\\\"10.42.4.19\\\"}  label:{name:\\\"pid\\\"  value:\\\"374\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"state\\\"  value:\\\"streaming\\\"}  label:{name:\\\"sync_state\\\"  value:\\\"async\\\"}  label:{name:\\\"usename\\\"  value:\\\"_crunchyrepl\\\"}  gauge:{value:3.01989888e+08}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_write_lag_seconds\\\" { label:{name:\\\"application_name\\\"  value:\\\"iam-postgres-cluster-instance1-4l48-0\\\"}  label:{name:\\\"client_addr\\\"  value:\\\"10.42.4.19\\\"}  label:{name:\\\"pid\\\"  value:\\\"374\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"state\\\"  value:\\\"streaming\\\"}  label:{name:\\\"sync_state\\\"  value:\\\"async\\\"}  label:{name:\\\"usename\\\"  value:\\\"_crunchyrepl\\\"}  gauge:{value:nan}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_flush_lag_seconds\\\" { label:{name:\\\"application_name\\\"  value:\\\"iam-postgres-cluster-instance1-4l48-0\\\"}  label:{name:\\\"client_addr\\\"  value:\\\"10.42.4.19\\\"}  label:{name:\\\"pid\\\"  value:\\\"374\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"state\\\"  value:\\\"streaming\\\"}  label:{name:\\\"sync_state\\\"  value:\\\"async\\\"}  label:{name:\\\"usename\\\"  value:\\\"_crunchyrepl\\\"}  gauge:{value:nan}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_replay_lag_seconds\\\" { label:{name:\\\"application_name\\\"  value:\\\"iam-postgres-cluster-instance1-4l48-0\\\"}  label:{name:\\\"client_addr\\\"  value:\\\"10.42.4.19\\\"}  label:{name:\\\"pid\\\"  value:\\\"374\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"state\\\"  value:\\\"streaming\\\"}  label:{name:\\\"sync_state\\\"  value:\\\"async\\\"}  label:{name:\\\"usename\\\"  value:\\\"_crunchyrepl\\\"}  gauge:{value:nan}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_flush_lsn_bytes\\\" { label:{name:\\\"application_name\\\"  value:\\\"iam-postgres-cluster-instance1-2lg8-0\\\"}  label:{name:\\\"client_addr\\\"  value:\\\"10.42.5.19\\\"}  label:{name:\\\"pid\\\"  value:\\\"752\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"state\\\"  value:\\\"streaming\\\"}  label:{name:\\\"sync_state\\\"  value:\\\"async\\\"}  label:{name:\\\"usename\\\"  value:\\\"_crunchyrepl\\\"}  gauge:{value:3.01989888e+08}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_write_lsn_bytes\\\" { label:{name:\\\"application_name\\\"  value:\\\"iam-postgres-cluster-instance1-2lg8-0\\\"}  label:{name:\\\"client_addr\\\"  value:\\\"10.42.5.19\\\"}  label:{name:\\\"pid\\\"  value:\\\"752\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"state\\\"  value:\\\"streaming\\\"}  label:{name:\\\"sync_state\\\"  value:\\\"async\\\"}  label:{name:\\\"usename\\\"  value:\\\"_crunchyrepl\\\"}  gauge:{value:3.01989888e+08}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_replay_lsn_bytes\\\" { label:{name:\\\"application_name\\\"  value:\\\"iam-postgres-cluster-instance1-2lg8-0\\\"}  label:{name:\\\"client_addr\\\"  value:\\\"10.42.5.19\\\"}  label:{name:\\\"pid\\\"  value:\\\"752\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"state\\\"  value:\\\"streaming\\\"}  label:{name:\\\"sync_state\\\"  value:\\\"async\\\"}  label:{name:\\\"usename\\\"  value:\\\"_crunchyrepl\\\"}  gauge:{value:3.01989888e+08}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_write_lag_seconds\\\" { label:{name:\\\"application_name\\\"  value:\\\"iam-postgres-cluster-instance1-2lg8-0\\\"}  label:{name:\\\"client_addr\\\"  value:\\\"10.42.5.19\\\"}  label:{name:\\\"pid\\\"  value:\\\"752\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"state\\\"  value:\\\"streaming\\\"}  label:{name:\\\"sync_state\\\"  value:\\\"async\\\"}  label:{name:\\\"usename\\\"  value:\\\"_crunchyrepl\\\"}  gauge:{value:nan}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_flush_lag_seconds\\\" { label:{name:\\\"application_name\\\"  value:\\\"iam-postgres-cluster-instance1-2lg8-0\\\"}  label:{name:\\\"client_addr\\\"  value:\\\"10.42.5.19\\\"}  label:{name:\\\"pid\\\"  value:\\\"752\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"state\\\"  value:\\\"streaming\\\"}  label:{name:\\\"sync_state\\\"  value:\\\"async\\\"}  label:{name:\\\"usename\\\"  value:\\\"_crunchyrepl\\\"}  gauge:{value:nan}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_replay_lag_seconds\\\" { label:{name:\\\"application_name\\\"  value:\\\"iam-postgres-cluster-instance1-2lg8-0\\\"}  label:{name:\\\"client_addr\\\"  value:\\\"10.42.5.19\\\"}  label:{name:\\\"pid\\\"  value:\\\"752\\\"}  label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  label:{name:\\\"state\\\"  value:\\\"streaming\\\"}  label:{name:\\\"sync_state\\\"  value:\\\"async\\\"}  label:{name:\\\"usename\\\"  value:\\\"_crunchyrepl\\\"}  gauge:{value:nan}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_replication_wal_received_lsn\\\" { label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  gauge:{value:nan}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_replication_wal_lag_bytes\\\" { label:{name:\\\"server\\\"  value:\\\"127.0.0.1:5432\\\"}  gauge:{value:5.0331648e+07}} was collected before with the same name and label values\"" agentID=de1550a3-b421-4f31-a35e-9400e55448fe component=agent-process type=postgres_exporter

As you can see, these log lines are quite long. And as they indicate, some queries are being run against an instance that is 'recovering', meaning it is in the state of ingesting changes from the main instance.

More about the problem

Expected behavior would be that the pmm-client (or its postgres_exporter) skips some queries the instance cannot answer because of its cluster position of being a replica. If that is not possible, it should not log such kind of failed queries as 'errors'.

Steps to reproduce

  1. Start a postgres cluster with the following:
apiVersion: pgv2.percona.com/v2
kind: PerconaPGCluster
metadata:
  name: cluster1
  annotations:
    pgv2.percona.com/custom-patroni-version: "4" # this is set because patroni version can get issued on arm64 nodes in my cluster
spec:
  crVersion: 2.7.0
  image: docker.io/percona/percona-postgresql-operator:2.7.0-ppg17.5.2-postgres
  imagePullPolicy: Always
  postgresVersion: 17
  instances:
    - name: instance1
      replicas: 3
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: mycompany.com/run
                    operator: In
                    values: [ "db" ]
      dataVolumeClaimSpec:
        storageClassName: longhorn-fast
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
  proxy:
    pgBouncer:
      replicas: 3
      image: docker.io/percona/percona-pgbouncer:1.24.1

  backups:
    pgbackrest:
      image: docker.io/percona/percona-pgbackrest:2.55.0
      repoHost: {}
      manual:
        repoName: repo1
        options:
          - --type=full
      repos:
        - name: repo1
          schedules:
            full: "0 0 * * 6"
          volume:
            volumeClaimSpec:
              storageClassName: longhorn-fast
              accessModes:
                - ReadWriteOnce
              resources:
                requests:
                  storage: 1Gi
  pmm:
    enabled: true
    image: docker.io/percona/pmm-client:3.3.1
    secret: pmm-iam-service-account-secret
    serverHost: monitoring-service.pmm.svc.cluster.local
  1. Wait for the pmm client on a replica instance to connect to its db.
  2. Observe something like:
time="2025-08-26T10:58:20.201+00:00" level=error msg="ts=2025-08-26T10:58:20.201Z caller=namespace.go:236 level=error err=\"Error running query on database \\\"127.0.0.1:5432\\\": pg_custom_replication_wal pq: recovery is in progress\"" agentID=b8712013-dec7-40ec-8ba0-0a88e92bb969 component=agent-process type=postgres_exporter
time="2025-08-26T10:58:20.214+00:00" level=error msg="ts=2025-08-26T10:58:20.214Z caller=server.go:130 level=error msg=\"NAMESPACE ERRORS FOUND\"" agentID=b8712013-dec7-40ec-8ba0-0a88e92bb969 component=agent-process type=postgres_exporter
time="2025-08-26T10:58:20.214+00:00" level=error msg="ts=2025-08-26T10:58:20.214Z caller=server.go:132 level=error namespace=pg_custom_replication_wal msg=\"Error running query on database \\\"127.0.0.1:5432\\\": pg_custom_replication_wal pq: recovery is in progress\"" agentID=b8712013-dec7-40ec-8ba0-0a88e92bb969 component=agent-process type=postgres_exporter
time="2025-08-26T10:58:20.214+00:00" level=error msg="ts=2025-08-26T10:58:20.214Z caller=postgres_exporter.go:770 level=error err=\"queryNamespaceMappings returned 1 errors\"" agentID=b8712013-dec7-40ec-8ba0-0a88e92bb969 component=agent-process type=postgres_exporter
time="2025-08-26T10:58:23.310+00:00" level=info msg="ts=2025-08-26T10:58:23.310Z caller=percona_exporter.go:86 msg=\"Excluded databases\" databases=\"[template0 template1 cloudsqladmin pmm-managed-dev azure_maintenance rdsadmin]\"" agentID=b8712013-dec7-40ec-8ba0-0a88e92bb969 component=agent-process type=postgres_exporter

Versions

  1. Kubernetes: v1.33.3+k3s1
  2. Operator: v2.7.0
  3. Database: tested with both 2.7.0-ppg16.9-postgres and 2.7.0-ppg17.5.2-postgres

Anything else?

There are some related things I found:

I use Selinux on my nodes, maybe that could cause issues?

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingv2.8.0

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions