-
Notifications
You must be signed in to change notification settings - Fork 66
Description
Report
After setting up a postgres cluster with percona-postgres-operator and enabling pmm-client on it, the pmm-client sidecars that are attached to replica postgres instances spam the following errors:
time="2025-08-25T15:29:02.407+00:00" level=info msg="ts=2025-08-25T15:29:02.407Z caller=percona_exporter.go:86 msg=\"Excluded databases\" databases=\"[template0 template1 cloudsqladmin pmm-managed-dev azure_maintenance rdsadmin]\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.094+00:00" level=info msg="ts=2025-08-25T15:29:07.094Z caller=percona_exporter.go:86 msg=\"Excluded databases\" databases=\"[template0 template1 cloudsqladmin pmm-managed-dev azure_maintenance rdsadmin]\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.114+00:00" level=error msg="ts=2025-08-25T15:29:07.113Z caller=namespace.go:236 level=error err=\"Error running query on database \\\"127.0.0.1:5432\\\": pg_custom_replication_wal pq: recovery is in progress\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.117+00:00" level=error msg="ts=2025-08-25T15:29:07.117Z caller=namespace.go:236 level=error err=\"Error running query on database \\\"127.0.0.1:5432\\\": pg_custom_replication_wal pq: recovery is in progress\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.119+00:00" level=error msg="ts=2025-08-25T15:29:07.119Z caller=server.go:130 level=error msg=\"NAMESPACE ERRORS FOUND\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.119+00:00" level=error msg="ts=2025-08-25T15:29:07.119Z caller=server.go:132 level=error namespace=pg_custom_replication_wal msg=\"Error running query on database \\\"127.0.0.1:5432\\\": pg_custom_replication_wal pq: recovery is in progress\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.119+00:00" level=error msg="ts=2025-08-25T15:29:07.119Z caller=postgres_exporter.go:770 level=error err=\"queryNamespaceMappings returned 1 errors\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.124+00:00" level=error msg="ts=2025-08-25T15:29:07.124Z caller=server.go:130 level=error msg=\"NAMESPACE ERRORS FOUND\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.124+00:00" level=error msg="ts=2025-08-25T15:29:07.124Z caller=server.go:132 level=error namespace=pg_custom_replication_wal msg=\"Error running query on database \\\"127.0.0.1:5432\\\": pg_custom_replication_wal pq: recovery is in progress\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.124+00:00" level=error msg="ts=2025-08-25T15:29:07.124Z caller=postgres_exporter.go:770 level=error err=\"queryNamespaceMappings returned 1 errors\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
time="2025-08-25T15:29:07.125+00:00" level=info msg="ts=2025-08-25T15:29:07.125Z caller=log.go:245 msg=\"handlererror gathering metrics: 11 error(s) occurred:\\n* collected metric \\\"pg_custom_database_size_custom_bytes\\\" { label:{name:\\\"datname\\\" value:\\\"template1\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} gauge:{value:7.930383e+06}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_database_size_custom_bytes\\\" { label:{name:\\\"datname\\\" value:\\\"template0\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} gauge:{value:7.725583e+06}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_database_size_custom_bytes\\\" { label:{name:\\\"datname\\\" value:\\\"postgres\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} gauge:{value:8.113299e+06}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_database_size_custom_bytes\\\" { label:{name:\\\"datname\\\" value:\\\"iamapp\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} gauge:{value:8.039571e+06}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_wal_receiver_written_lsn_bytes\\\" { label:{name:\\\"primary_host\\\" value:\\\"iam-postgres-cluster-instance1-bn7t-0.iam-postgres-cluster-pods\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"status\\\" value:\\\"streaming\\\"} gauge:{value:3.01989888e+08}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_wal_receiver_latest_end_lsn_bytes\\\" { label:{name:\\\"primary_host\\\" value:\\\"iam-postgres-cluster-instance1-bn7t-0.iam-postgres-cluster-pods\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"status\\\" value:\\\"streaming\\\"} gauge:{value:3.01989888e+08}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_wal_receiver_lag_bytes\\\" { label:{name:\\\"primary_host\\\" value:\\\"iam-postgres-cluster-instance1-bn7t-0.iam-postgres-cluster-pods\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"status\\\" value:\\\"streaming\\\"} gauge:{value:0}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_wal_receiver_latest_end_time_seconds\\\" { label:{name:\\\"primary_host\\\" value:\\\"iam-postgres-cluster-instance1-bn7t-0.iam-postgres-cluster-pods\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"status\\\" value:\\\"streaming\\\"} gauge:{value:1.756135467307919e+09}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_wal_receiver_lag_time_seconds\\\" { label:{name:\\\"primary_host\\\" value:\\\"iam-postgres-cluster-instance1-bn7t-0.iam-postgres-cluster-pods\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"status\\\" value:\\\"streaming\\\"} gauge:{value:279.816133}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_wal_receiver_last_msg_send_time_seconds\\\" { label:{name:\\\"primary_host\\\" value:\\\"iam-postgres-cluster-instance1-bn7t-0.iam-postgres-cluster-pods\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"status\\\" value:\\\"streaming\\\"} gauge:{value:1.7561357382459e+09}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_wal_receiver_last_msg_receipt_time_seconds\\\" { label:{name:\\\"primary_host\\\" value:\\\"iam-postgres-cluster-instance1-bn7t-0.iam-postgres-cluster-pods\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"status\\\" value:\\\"streaming\\\"} gauge:{value:1.756135738258575e+09}} was collected before with the same name and label values\"" agentID=cd482abd-7606-4364-a30b-3105066e76e7 component=agent-process type=postgres_exporter
Meanwhile, the sidecar pmm-client running next to the primary instance reports:
time="2025-08-25T15:31:53.632+00:00" level=info msg="ts=2025-08-25T15:31:53.631Z caller=percona_exporter.go:86 msg=\"Excluded databases\" databases=\"[template0 template1 cloudsqladmin pmm-managed-dev azure_maintenance rdsadmin]\"" agentID=de1550a3-b421-4f31-a35e-9400e55448fe component=agent-process type=postgres_exporter
time="2025-08-25T15:31:53.658+00:00" level=info msg="ts=2025-08-25T15:31:53.658Z caller=log.go:245 msg=\"handlererror gathering metrics: 22 error(s) occurred:\\n* collected metric \\\"pg_custom_stat_activity_walsender_pid\\\" { label:{name:\\\"application_name\\\" value:\\\"iam-postgres-cluster-instance1-2lg8-0\\\"} label:{name:\\\"client_addr\\\" value:\\\"10.42.5.19\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"state\\\" value:\\\"active\\\"} label:{name:\\\"usename\\\" value:\\\"_crunchyrepl\\\"} gauge:{value:752}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_activity_walsender_backend_start_unix\\\" { label:{name:\\\"application_name\\\" value:\\\"iam-postgres-cluster-instance1-2lg8-0\\\"} label:{name:\\\"client_addr\\\" value:\\\"10.42.5.19\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"state\\\" value:\\\"active\\\"} label:{name:\\\"usename\\\" value:\\\"_crunchyrepl\\\"} gauge:{value:1.756135133033277e+09}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_activity_walsender_pid\\\" { label:{name:\\\"application_name\\\" value:\\\"iam-postgres-cluster-instance1-4l48-0\\\"} label:{name:\\\"client_addr\\\" value:\\\"10.42.4.19\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"state\\\" value:\\\"active\\\"} label:{name:\\\"usename\\\" value:\\\"_crunchyrepl\\\"} gauge:{value:374}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_activity_walsender_backend_start_unix\\\" { label:{name:\\\"application_name\\\" value:\\\"iam-postgres-cluster-instance1-4l48-0\\\"} label:{name:\\\"client_addr\\\" value:\\\"10.42.4.19\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"state\\\" value:\\\"active\\\"} label:{name:\\\"usename\\\" value:\\\"_crunchyrepl\\\"} gauge:{value:1.756135109421014e+09}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_database_size_custom_bytes\\\" { label:{name:\\\"datname\\\" value:\\\"template1\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} gauge:{value:8.088723e+06}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_database_size_custom_bytes\\\" { label:{name:\\\"datname\\\" value:\\\"template0\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} gauge:{value:7.725583e+06}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_database_size_custom_bytes\\\" { label:{name:\\\"datname\\\" value:\\\"postgres\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} gauge:{value:8.113299e+06}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_database_size_custom_bytes\\\" { label:{name:\\\"datname\\\" value:\\\"iamapp\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} gauge:{value:8.039571e+06}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_flush_lsn_bytes\\\" { label:{name:\\\"application_name\\\" value:\\\"iam-postgres-cluster-instance1-4l48-0\\\"} label:{name:\\\"client_addr\\\" value:\\\"10.42.4.19\\\"} label:{name:\\\"pid\\\" value:\\\"374\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"state\\\" value:\\\"streaming\\\"} label:{name:\\\"sync_state\\\" value:\\\"async\\\"} label:{name:\\\"usename\\\" value:\\\"_crunchyrepl\\\"} gauge:{value:3.01989888e+08}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_write_lsn_bytes\\\" { label:{name:\\\"application_name\\\" value:\\\"iam-postgres-cluster-instance1-4l48-0\\\"} label:{name:\\\"client_addr\\\" value:\\\"10.42.4.19\\\"} label:{name:\\\"pid\\\" value:\\\"374\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"state\\\" value:\\\"streaming\\\"} label:{name:\\\"sync_state\\\" value:\\\"async\\\"} label:{name:\\\"usename\\\" value:\\\"_crunchyrepl\\\"} gauge:{value:3.01989888e+08}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_replay_lsn_bytes\\\" { label:{name:\\\"application_name\\\" value:\\\"iam-postgres-cluster-instance1-4l48-0\\\"} label:{name:\\\"client_addr\\\" value:\\\"10.42.4.19\\\"} label:{name:\\\"pid\\\" value:\\\"374\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"state\\\" value:\\\"streaming\\\"} label:{name:\\\"sync_state\\\" value:\\\"async\\\"} label:{name:\\\"usename\\\" value:\\\"_crunchyrepl\\\"} gauge:{value:3.01989888e+08}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_write_lag_seconds\\\" { label:{name:\\\"application_name\\\" value:\\\"iam-postgres-cluster-instance1-4l48-0\\\"} label:{name:\\\"client_addr\\\" value:\\\"10.42.4.19\\\"} label:{name:\\\"pid\\\" value:\\\"374\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"state\\\" value:\\\"streaming\\\"} label:{name:\\\"sync_state\\\" value:\\\"async\\\"} label:{name:\\\"usename\\\" value:\\\"_crunchyrepl\\\"} gauge:{value:nan}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_flush_lag_seconds\\\" { label:{name:\\\"application_name\\\" value:\\\"iam-postgres-cluster-instance1-4l48-0\\\"} label:{name:\\\"client_addr\\\" value:\\\"10.42.4.19\\\"} label:{name:\\\"pid\\\" value:\\\"374\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"state\\\" value:\\\"streaming\\\"} label:{name:\\\"sync_state\\\" value:\\\"async\\\"} label:{name:\\\"usename\\\" value:\\\"_crunchyrepl\\\"} gauge:{value:nan}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_replay_lag_seconds\\\" { label:{name:\\\"application_name\\\" value:\\\"iam-postgres-cluster-instance1-4l48-0\\\"} label:{name:\\\"client_addr\\\" value:\\\"10.42.4.19\\\"} label:{name:\\\"pid\\\" value:\\\"374\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"state\\\" value:\\\"streaming\\\"} label:{name:\\\"sync_state\\\" value:\\\"async\\\"} label:{name:\\\"usename\\\" value:\\\"_crunchyrepl\\\"} gauge:{value:nan}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_flush_lsn_bytes\\\" { label:{name:\\\"application_name\\\" value:\\\"iam-postgres-cluster-instance1-2lg8-0\\\"} label:{name:\\\"client_addr\\\" value:\\\"10.42.5.19\\\"} label:{name:\\\"pid\\\" value:\\\"752\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"state\\\" value:\\\"streaming\\\"} label:{name:\\\"sync_state\\\" value:\\\"async\\\"} label:{name:\\\"usename\\\" value:\\\"_crunchyrepl\\\"} gauge:{value:3.01989888e+08}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_write_lsn_bytes\\\" { label:{name:\\\"application_name\\\" value:\\\"iam-postgres-cluster-instance1-2lg8-0\\\"} label:{name:\\\"client_addr\\\" value:\\\"10.42.5.19\\\"} label:{name:\\\"pid\\\" value:\\\"752\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"state\\\" value:\\\"streaming\\\"} label:{name:\\\"sync_state\\\" value:\\\"async\\\"} label:{name:\\\"usename\\\" value:\\\"_crunchyrepl\\\"} gauge:{value:3.01989888e+08}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_replay_lsn_bytes\\\" { label:{name:\\\"application_name\\\" value:\\\"iam-postgres-cluster-instance1-2lg8-0\\\"} label:{name:\\\"client_addr\\\" value:\\\"10.42.5.19\\\"} label:{name:\\\"pid\\\" value:\\\"752\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"state\\\" value:\\\"streaming\\\"} label:{name:\\\"sync_state\\\" value:\\\"async\\\"} label:{name:\\\"usename\\\" value:\\\"_crunchyrepl\\\"} gauge:{value:3.01989888e+08}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_write_lag_seconds\\\" { label:{name:\\\"application_name\\\" value:\\\"iam-postgres-cluster-instance1-2lg8-0\\\"} label:{name:\\\"client_addr\\\" value:\\\"10.42.5.19\\\"} label:{name:\\\"pid\\\" value:\\\"752\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"state\\\" value:\\\"streaming\\\"} label:{name:\\\"sync_state\\\" value:\\\"async\\\"} label:{name:\\\"usename\\\" value:\\\"_crunchyrepl\\\"} gauge:{value:nan}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_flush_lag_seconds\\\" { label:{name:\\\"application_name\\\" value:\\\"iam-postgres-cluster-instance1-2lg8-0\\\"} label:{name:\\\"client_addr\\\" value:\\\"10.42.5.19\\\"} label:{name:\\\"pid\\\" value:\\\"752\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"state\\\" value:\\\"streaming\\\"} label:{name:\\\"sync_state\\\" value:\\\"async\\\"} label:{name:\\\"usename\\\" value:\\\"_crunchyrepl\\\"} gauge:{value:nan}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_stat_replication_replay_lag_seconds\\\" { label:{name:\\\"application_name\\\" value:\\\"iam-postgres-cluster-instance1-2lg8-0\\\"} label:{name:\\\"client_addr\\\" value:\\\"10.42.5.19\\\"} label:{name:\\\"pid\\\" value:\\\"752\\\"} label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} label:{name:\\\"state\\\" value:\\\"streaming\\\"} label:{name:\\\"sync_state\\\" value:\\\"async\\\"} label:{name:\\\"usename\\\" value:\\\"_crunchyrepl\\\"} gauge:{value:nan}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_replication_wal_received_lsn\\\" { label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} gauge:{value:nan}} was collected before with the same name and label values\\n* collected metric \\\"pg_custom_replication_wal_lag_bytes\\\" { label:{name:\\\"server\\\" value:\\\"127.0.0.1:5432\\\"} gauge:{value:5.0331648e+07}} was collected before with the same name and label values\"" agentID=de1550a3-b421-4f31-a35e-9400e55448fe component=agent-process type=postgres_exporter
As you can see, these log lines are quite long. And as they indicate, some queries are being run against an instance that is 'recovering', meaning it is in the state of ingesting changes from the main instance.
More about the problem
Expected behavior would be that the pmm-client (or its postgres_exporter) skips some queries the instance cannot answer because of its cluster position of being a replica. If that is not possible, it should not log such kind of failed queries as 'errors'.
Steps to reproduce
- Start a postgres cluster with the following:
apiVersion: pgv2.percona.com/v2
kind: PerconaPGCluster
metadata:
name: cluster1
annotations:
pgv2.percona.com/custom-patroni-version: "4" # this is set because patroni version can get issued on arm64 nodes in my cluster
spec:
crVersion: 2.7.0
image: docker.io/percona/percona-postgresql-operator:2.7.0-ppg17.5.2-postgres
imagePullPolicy: Always
postgresVersion: 17
instances:
- name: instance1
replicas: 3
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: mycompany.com/run
operator: In
values: [ "db" ]
dataVolumeClaimSpec:
storageClassName: longhorn-fast
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
proxy:
pgBouncer:
replicas: 3
image: docker.io/percona/percona-pgbouncer:1.24.1
backups:
pgbackrest:
image: docker.io/percona/percona-pgbackrest:2.55.0
repoHost: {}
manual:
repoName: repo1
options:
- --type=full
repos:
- name: repo1
schedules:
full: "0 0 * * 6"
volume:
volumeClaimSpec:
storageClassName: longhorn-fast
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
pmm:
enabled: true
image: docker.io/percona/pmm-client:3.3.1
secret: pmm-iam-service-account-secret
serverHost: monitoring-service.pmm.svc.cluster.local
- Wait for the pmm client on a replica instance to connect to its db.
- Observe something like:
time="2025-08-26T10:58:20.201+00:00" level=error msg="ts=2025-08-26T10:58:20.201Z caller=namespace.go:236 level=error err=\"Error running query on database \\\"127.0.0.1:5432\\\": pg_custom_replication_wal pq: recovery is in progress\"" agentID=b8712013-dec7-40ec-8ba0-0a88e92bb969 component=agent-process type=postgres_exporter
time="2025-08-26T10:58:20.214+00:00" level=error msg="ts=2025-08-26T10:58:20.214Z caller=server.go:130 level=error msg=\"NAMESPACE ERRORS FOUND\"" agentID=b8712013-dec7-40ec-8ba0-0a88e92bb969 component=agent-process type=postgres_exporter
time="2025-08-26T10:58:20.214+00:00" level=error msg="ts=2025-08-26T10:58:20.214Z caller=server.go:132 level=error namespace=pg_custom_replication_wal msg=\"Error running query on database \\\"127.0.0.1:5432\\\": pg_custom_replication_wal pq: recovery is in progress\"" agentID=b8712013-dec7-40ec-8ba0-0a88e92bb969 component=agent-process type=postgres_exporter
time="2025-08-26T10:58:20.214+00:00" level=error msg="ts=2025-08-26T10:58:20.214Z caller=postgres_exporter.go:770 level=error err=\"queryNamespaceMappings returned 1 errors\"" agentID=b8712013-dec7-40ec-8ba0-0a88e92bb969 component=agent-process type=postgres_exporter
time="2025-08-26T10:58:23.310+00:00" level=info msg="ts=2025-08-26T10:58:23.310Z caller=percona_exporter.go:86 msg=\"Excluded databases\" databases=\"[template0 template1 cloudsqladmin pmm-managed-dev azure_maintenance rdsadmin]\"" agentID=b8712013-dec7-40ec-8ba0-0a88e92bb969 component=agent-process type=postgres_exporter
Versions
- Kubernetes: v1.33.3+k3s1
- Operator: v2.7.0
- Database: tested with both 2.7.0-ppg16.9-postgres and 2.7.0-ppg17.5.2-postgres
Anything else?
There are some related things I found:
- https://forums.percona.com/t/pmm3-postgres-on-read-only-pg-custom-replication-wal-pq-recovery-is-in-progress/38681
- Exporter container log of replica instance always reports pg_replication_slots pq: recovery is in progress prometheus-community/postgres_exporter#962
I use Selinux on my nodes, maybe that could cause issues?