-
Notifications
You must be signed in to change notification settings - Fork 45
metrics: Recommend better metrics for global produce/consume TP #1269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThe changes update documentation to replace references to the Kafka-specific metric Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
✅ Deploy Preview for redpanda-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
modules/upgrade/partials/rolling-upgrades/check-metrics.adoc (1)
19-22
: Describe counters accurately and remind users to wrap them inrate()
redpanda_rpc_received_bytes
andredpanda_rpc_sent_bytes
are monotonically-increasing counters (bytes received vs sent).
Calling them “Total bytes processed” is vague and, more importantly, the table omits the usual guidance to wrap counters in arate()
query. Consider tightening the wording and adding therate()
reminder so users don’t mis-interpret the raw counter values as throughput.-| Total bytes processed for Kafka requests. +| Counters for bytes **received from** (produce) and **sent to** (consume) Kafka clients. +Use `rate(<metric>[5m])` (or a similar interval) to convert these counters into per-second throughput.Also double-check that the two new xrefs resolve—broken anchors will render as plain text.
modules/manage/partials/monitor-health.adoc (1)
92-93
: Phrasing nit – reorder for clarity“with the
kafka
redpanda_server
label” reads awkwardly. Swapping the order clarifies thatkafka
is the label value:-monitor … with the `kafka` `redpanda_server` label. +monitor … with the label `redpanda_server="kafka"`.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
modules/manage/partials/monitor-health.adoc
(2 hunks)modules/upgrade/partials/rolling-upgrades/check-metrics.adoc
(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: JakeSCahill
PR: redpanda-data/docs#1192
File: modules/deploy/partials/requirements.adoc:91-93
Timestamp: 2025-07-02T14:54:03.506Z
Learning: In Redpanda documentation, use GiB (binary units, powers of 2) for Kubernetes-specific memory requirements because Kubernetes treats memory units like Mi, Gi as binary units. Use GB (decimal units, powers of 10) for general broker memory requirements in non-Kubernetes contexts.
Learnt from: Feediver1
PR: redpanda-data/docs#1153
File: modules/reference/pages/properties/topic-properties.adoc:45-50
Timestamp: 2025-07-16T19:33:20.420Z
Learning: In the Redpanda documentation, topic property cross-references like <<max.compaction.lag.ms>> and <<min.compaction.lag.ms>> require corresponding property definition sections with anchors like [[maxcompactionlagms]] and [[mincompactionlagms]] to prevent broken links.
Learnt from: Feediver1
PR: redpanda-data/docs#1153
File: antora.yml:3-5
Timestamp: 2025-07-14T19:28:43.296Z
Learning: In Redpanda docs, during beta releases, the version metadata may intentionally show inconsistencies where the header displays the beta version (e.g., 25.2 Beta) while internal attributes like full-version, latest-redpanda-tag, operator-beta-tag still reference the stable version (e.g., 25.1). This is resolved during the GA merge process when all version references are synchronized.
Learnt from: paulohtb6
PR: redpanda-data/docs#0
File: :0-0
Timestamp: 2025-07-15T20:38:27.458Z
Learning: In Redpanda documentation, "Redpanda Data" refers to the company name, while "Redpanda" refers to the product name. These terms should be used appropriately based on context.
📚 Learning: in the redpanda documentation, topic property cross-references like <> and <<...
Learnt from: Feediver1
PR: redpanda-data/docs#1153
File: modules/reference/pages/properties/topic-properties.adoc:45-50
Timestamp: 2025-07-16T19:33:20.420Z
Learning: In the Redpanda documentation, topic property cross-references like <<max.compaction.lag.ms>> and <<min.compaction.lag.ms>> require corresponding property definition sections with anchors like [[maxcompactionlagms]] and [[mincompactionlagms]] to prevent broken links.
Applied to files:
modules/upgrade/partials/rolling-upgrades/check-metrics.adoc
modules/manage/partials/monitor-health.adoc
📚 Learning: in redpanda docs, during beta releases, the version metadata may intentionally show inconsistencies ...
Learnt from: Feediver1
PR: redpanda-data/docs#1153
File: antora.yml:3-5
Timestamp: 2025-07-14T19:28:43.296Z
Learning: In Redpanda docs, during beta releases, the version metadata may intentionally show inconsistencies where the header displays the beta version (e.g., 25.2 Beta) while internal attributes like full-version, latest-redpanda-tag, operator-beta-tag still reference the stable version (e.g., 25.1). This is resolved during the GA merge process when all version references are synchronized.
Applied to files:
modules/upgrade/partials/rolling-upgrades/check-metrics.adoc
📚 Learning: in redpanda documentation, use gib (binary units, powers of 2) for kubernetes-specific memory requir...
Learnt from: JakeSCahill
PR: redpanda-data/docs#1192
File: modules/deploy/partials/requirements.adoc:91-93
Timestamp: 2025-07-02T14:54:03.506Z
Learning: In Redpanda documentation, use GiB (binary units, powers of 2) for Kubernetes-specific memory requirements because Kubernetes treats memory units like Mi, Gi as binary units. Use GB (decimal units, powers of 10) for general broker memory requirements in non-Kubernetes contexts.
Applied to files:
modules/upgrade/partials/rolling-upgrades/check-metrics.adoc
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Redirect rules - redpanda-docs-preview
- GitHub Check: Header rules - redpanda-docs-preview
- GitHub Check: Pages changed - redpanda-docs-preview
rate(redpanda_rpc_received_bytes{redpanda_server="kafka"}[$__rate_interval]) | ||
---- |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
rate(redpanda_rpc_sent_bytes{redpanda_server="kafka"}[$__rate_interval]) | ||
---- |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
||
==== Producer throughput | ||
|
||
For the produce rate, create a query to get the produce rate across all topics: | ||
|
||
[,promql] | ||
---- | ||
sum(rate(redpanda_kafka_request_bytes_total{redpanda_request="produce"} [5m] )) by (redpanda_request) | ||
rate(redpanda_rpc_received_bytes{redpanda_server="kafka"}[$__rate_interval]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any comment on dropping sum
? All series will be present in the result, I think it's consistent with line 96 right above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All series will be present in the result
Just to clarify, there is no topic here so there will be one series per broker.
Any comment on dropping sum?
I mean this example query seems fairly arbitrary to me. You'll likely have to modify it anyway to add a cluster label or something.
We could just do any empty sum by which again would aggregate everything but that seems kinda wrong to me. No strong feelings though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it how you have it, not sure what I was even thinking here (as it's consistent with the other queries I can see in the doc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review complete with comments.
79679bc
to
0597cfe
Compare
|
||
==== Producer throughput | ||
|
||
For the produce rate, create a query to get the produce rate across all topics: | ||
|
||
[,promql] | ||
---- | ||
sum(rate(redpanda_kafka_request_bytes_total{redpanda_request="produce"} [5m] )) by (redpanda_request) | ||
rate(redpanda_rpc_received_bytes{redpanda_server="kafka"}[$__rate_interval]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it how you have it, not sure what I was even thinking here (as it's consistent with the other queries I can see in the doc).
We added these metrics a long while ago for better monitoring of global produce/consume throughput. They don't suffer from various issues (prometheus bugs, including non-returned bytes) like the per partition metrics. See redpanda-data/redpanda#14836 for more background. Hence, recommend them for total produce/consume TP monitoring.
0597cfe
to
d4b56ce
Compare
We added these metrics a long while ago for better monitoring of global produce/consume throughput.
They don't suffer from various issues (prometheus bugs, including non-returned bytes) like the per partition metrics.
See redpanda-data/redpanda#14836 for more background.
Hence, recommend them for total produce/consume TP monitoring.
Description
Resolves https://redpandadata.atlassian.net/browse/
Review deadline:
Page previews
Checks