Skip to content

Changes in pg scrape behavior in version 3.5 #825

@ShkodaAlexey

Description

@ShkodaAlexey

hi, after upgrading from version 3.1.0 to version 3.5.0 I noticed strange behavior in the collected metrics when saving in PG, which made me go to investigate by versions what exactly changed.

  1. 3.1.0 -> 3.2.0 without changes
  2. 3.2.0 -> 3.3.0 I see that sys_id and real_dbname were added to the data field (maybe I didn't read the changelog well but I didn't find this change in it)
  3. 3.3.0 -> 3.4.0 I didn't notice any changes
  4. 3.4.0 - 3.5.0 here very strange behavior starts, epoch_ns was added to the data field, also pid and a random query appear in the tag_data field in those metrics where it is not declared, and where it is declared it duplicates the query from data (but the queries are completely different), writing additional data to tag_data looks like a bug (+ declared tag_data env/cluster/database/tablename/ndex_full_name disappears). And what is very interesting is that if I turn off the collection of stat_statements in the preset, then in the problematic metrics, pid and query will disappear from tag_data, only epoch_ns remains in data.

invalid_indexes metric example

invalid_indexes:
          sqls:
              11: |-
                  select /* pgwatch_generated */
                    (extract(epoch from now()) * 1e9)::int8 as epoch_ns,
                    format('%I.%I', n.nspname , ci.relname) as tag_index_full_name,
                    cr.relname as  tag_tablename,
                    1 as int
                  from
                    pg_index i
                    join pg_class ci on ci.oid = i.indexrelid
                    join pg_class cr on cr.oid = i.indrelid
                    join pg_namespace n on n.oid = ci.relnamespace
                  where not n.nspname like E'pg\\_temp%'
                  and not indisvalid
                  and not exists ( /* leave out ones that are being actively rebuilt */
                    select * from pg_locks l
                    join pg_stat_activity a using (pid)
                    where l.relation = i.indexrelid
                    and a.state = 'active'
                    and a.query ~* 'concurrently'
                  )
                  limit 100
          node_status: primary    

tag 3.1.0 -> 3.2.0 (with stat_statements)

│ time     │ 2025-06-30 07:28:50.625773                                                                                                                                                                       
│ dbname   │ db1-host1                                                                                                                                                                               
│ data     │ {"int": 1}                                                                                                                                                                                              
│ tag_data │ {"host": "host1", "env": "stage", "cluster": "tst-cluster", "database": "db1", "tablename": "mytbl", "index_full_name": "public.mytbl_t_idx_ccnew"}    

tag 3.2.0 -> 3.3.0-> 3.4.0 (with stat_statements)

│ time     │ 2025-06-30 07:36:28.674967                                                                                                                                                                       
│ dbname   │ db1-host1                                                                                                                                                                               
│ data     │ {"int": 1, "sys_id": "7477487578428529559", "real_dbname": "db1"}                                                                                                                                       
│ tag_data │ {"host": "host1", "env": "stage", "cluster": "tst-cluster", "database": "db1", "tablename": "mytbl", "index_full_name": "public.mytbl_t_idx_ccnew"}    

3.4.0 -> 3.5.0 (with stat_statements)

│ time     │ 2025-06-30 07:39:05.410566
│ dbname   │ db1-host1
│ data     │ {"int": 1, "sys_id": "7477487578428529559", "epoch_ns": 1751258345410566000, "real_dbname": "db1"}
│ tag_data │ {"pid": "2894135", "host": "host1", "query": "select /* pgwatch_generated */ (extract($1 from now()) * $2)::int8 as epoch_ns, format($3, n.nspname , ci.relname) as 

3.4.0 -> 3.5.0 (disable scrape stat_statements)

 time     │ 2025-06-30 07:43:57.312991
│ dbname   │ db1-host1
│ data     │ {"int": 1, "sys_id": "7477487578428529559", "epoch_ns": 1751258637312991000, "real_dbname": "db1"}                                                                                                      
│ tag_data │ {"pid": "2894135", "host": "host1", "env": "stage", "cluster": "tst-cluster", "database": "db1", "tablename": "mytbl", "index_full_name": "public.mytbl

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingmetricsMetrics related issuesquestionFurther information is requested

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions