Skip to content

Add epoch version & nodes count sensors in NodeBroker & DynamicNameserver #20625

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 8, 2025

Conversation

pixcc
Copy link
Member

@pixcc pixcc commented Jul 4, 2025

Changelog category

  • Not for changelog (changelog entry is not required)

Description for reviewers

Этот PR добавляет в компоненты NodeBroker и DynamicNameserver новые метрики:

  • Версию эпохи
  • Количество active/expired/removed динамических узлов
  • Количество статических узлов

Основные изменения:

  • Расширен файл counters_node_broker.proto четырьмя новыми простыми счетчиками
  • Добавлен метод UpdateCommittedStateCounters в TNodeBroker, который вызывается после транзакций, изменяющих состояние
  • Добавлен метод UpdateCounters в TDynamicNameserver, который вызывается после изменения состояния

@pixcc pixcc requested a review from Copilot July 4, 2025 10:50
@pixcc pixcc linked an issue Jul 4, 2025 that may be closed by this pull request
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR instruments the NodeBroker and DynamicNameserver components with new metrics, including epoch version and active/expired/removed node counts.

  • Extended counters_node_broker.proto with four new simple counters.
  • Added UpdateCommittedStateCounters in TNodeBroker and invoked it after state-changing transactions.
  • Introduced metric registration and UpdateCounters logic in TDynamicNameserver, plus HTML reporting of protocol state.

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
ydb/core/protos/counters_node_broker.proto Added COUNTER_EPOCH_VERSION, ACTIVE_NODES, EXPIRED_NODES, REMOVED_NODES.
ydb/core/mind/node_broker_impl.h Declared UpdateCommittedStateCounters().
Multiple node_broker__*.cpp transaction files Inserted calls to UpdateCommittedStateCounters() after each commit.
ydb/core/mind/node_broker.cpp Implemented UpdateCommittedStateCounters().
ydb/core/mind/dynamic_nameserver_mon.cpp Added ToString(EProtocolState) and rendered protocol state in HTML.
ydb/core/mind/dynamic_nameserver_impl.h Declared counters and UpdateCounters().
ydb/core/mind/dynamic_nameserver.cpp Registered counters, embedded calls to UpdateCounters(), and implemented it.
Comments suppressed due to low confidence (3)

ydb/core/mind/dynamic_nameserver.cpp:248

  • The counter key string "ExpireDynamicNodes" is missing a trailing 'd' compared to the variable name ExpiredDynamicNodesCounter. Consider renaming the counter to "ExpiredDynamicNodes" for consistency.
    ExpiredDynamicNodesCounter = counters->GetCounter("ExpireDynamicNodes");

ydb/core/protos/counters_node_broker.proto:13

  • New counters are added in the proto but there are no corresponding unit or integration tests to validate their registration and values. Consider adding tests to cover these metrics.
    COUNTER_EPOCH_VERSION = 3               [(CounterOpts) = {Name: "EpochVersion"}];

ydb/core/mind/node_broker.cpp:789

  • [nitpick] It would be helpful to add a brief comment above this method describing the purpose of each counter (ActiveNodes, ExpiredNodes, RemovedNodes, EpochVersion) and when this method should be invoked.
void TNodeBroker::UpdateCommittedStateCounters() {

Copy link

github-actions bot commented Jul 4, 2025

2025-07-04 10:51:55 UTC Pre-commit check linux-x86_64-release-asan for cc46d41 has started.
2025-07-04 10:52:09 UTC Artifacts will be uploaded here
2025-07-04 10:55:54 UTC ya make is running...
🟡 2025-07-04 13:30:23 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
16336 15800 0 304 201 31

🟢 2025-07-04 13:31:49 UTC Build successful.
🟡 2025-07-04 13:32:19 UTC ydbd size 3.9 GiB changed* by +759.2 KiB, which is >= 100.0 KiB vs main: Warning

ydbd size dash main: 1049202 merge: cc46d41 diff diff %
ydbd size 4 199 267 048 Bytes 4 200 044 512 Bytes +759.2 KiB +0.019%
ydbd stripped size 1 455 054 296 Bytes 1 455 247 352 Bytes +188.5 KiB +0.013%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

Copy link

github-actions bot commented Jul 4, 2025

2025-07-04 10:52:29 UTC Pre-commit check linux-x86_64-relwithdebinfo for cc46d41 has started.
2025-07-04 10:52:33 UTC Artifacts will be uploaded here
2025-07-04 10:56:17 UTC ya make is running...
🟡 2025-07-04 12:42:49 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
38779 35855 0 210 2678 36

2025-07-04 12:46:17 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-07-04 12:58:30 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
653 (only retried tests) 414 0 209 2 28

2025-07-04 12:58:41 UTC ya make is running... (failed tests rerun, try 3)
🔴 2025-07-04 13:09:03 UTC Some tests failed, follow the links below.

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
568 (only retried tests) 331 0 209 2 26

🟢 2025-07-04 13:09:12 UTC Build successful.
🟢 2025-07-04 13:09:36 UTC ydbd size 2.2 GiB changed* by +21.7 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: cb76c1d merge: cc46d41 diff diff %
ydbd size 2 387 856 608 Bytes 2 387 878 872 Bytes +21.7 KiB +0.001%
ydbd stripped size 499 602 472 Bytes 499 609 640 Bytes +7.0 KiB +0.001%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

Copy link

github-actions bot commented Jul 4, 2025

🟢 2025-07-04 11:02:55 UTC The validation of the Pull Request description is successful.

Copy link

github-actions bot commented Jul 7, 2025

2025-07-07 20:41:27 UTC Pre-commit check linux-x86_64-relwithdebinfo for ae6869b has started.
2025-07-07 20:41:53 UTC Artifacts will be uploaded here
2025-07-07 20:46:10 UTC ya make is running...
🟡 2025-07-07 22:29:35 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
38845 36088 0 5 2704 48

2025-07-07 22:33:04 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-07-07 22:44:37 UTC Tests successful.

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
587 (only retried tests) 548 0 0 2 37

🟢 2025-07-07 22:44:46 UTC Build successful.
🟢 2025-07-07 22:45:07 UTC ydbd size 2.2 GiB changed* by +26.3 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: fb95647 merge: ae6869b diff diff %
ydbd size 2 388 278 960 Bytes 2 388 305 856 Bytes +26.3 KiB +0.001%
ydbd stripped size 499 794 696 Bytes 499 806 024 Bytes +11.1 KiB +0.002%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

Copy link

github-actions bot commented Jul 7, 2025

2025-07-07 20:41:30 UTC Pre-commit check linux-x86_64-release-asan for ae6869b has started.
2025-07-07 20:41:41 UTC Artifacts will be uploaded here
2025-07-07 20:45:17 UTC ya make is running...
🟡 2025-07-07 22:59:24 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
16386 16007 0 140 206 33

🟢 2025-07-07 23:00:48 UTC Build successful.
🟢 2025-07-07 23:01:17 UTC ydbd size 3.9 GiB changed* by +65.9 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: fb95647 merge: ae6869b diff diff %
ydbd size 4 199 875 936 Bytes 4 199 943 464 Bytes +65.9 KiB +0.002%
ydbd stripped size 1 455 299 128 Bytes 1 455 319 288 Bytes +19.7 KiB +0.001%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@pixcc pixcc marked this pull request as ready for review July 8, 2025 08:53
@pixcc pixcc requested review from snaury and a team as code owners July 8, 2025 08:53
@pixcc pixcc merged commit 9adb3a0 into ydb-platform:main Jul 8, 2025
12 checks passed
pixcc added a commit to pixcc/ydb that referenced this pull request Jul 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add version & nodes count sensors for DynamicNameserver
2 participants