-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Labels
bugSomething isn't workingSomething isn't workingdocumentationImprovements or additions to documentationImprovements or additions to documentation
Description
Context
Internal Slack thread (2026-03-18, #product discussion) requested docs updates after Grafana/dashboard changes:
- Storage alerts/metrics were refined to reduce noise.
- Job/fragment-level CPU, memory, and busy-rate metrics are now available.
- Dashboard tables now show bottleneck percentages by job.
- Team feedback: Jarvis/docs still often tell users to rely on backpressure first, which may no longer match current troubleshooting expectations.
Related references:
- https://github.com/risingwavelabs/risingwave-docs/blob/main/performance/troubleshoot-high-latency.mdx
- https://github.com/risingwavelabs/risingwave-docs/blob/main/performance/metrics.mdx
Problem
Current user-facing troubleshooting guidance appears inconsistent with the latest PROD Grafana workflow. Users may still be guided to use backpressure-centric debugging as the primary path, while the dashboard now exposes direct job/fragment-level CPU/memory/busy-rate and bottleneck ranking tables intended for faster bottleneck localization.
This can cause:
- slower triage due to outdated diagnostic sequence,
- confusion between dashboard capabilities and docs/Jarvis recommendations,
- inconsistent support guidance across engineers.
Suggested Fix
Update troubleshooting + metrics docs to reflect a modern, explicit workflow:
- Start from job/fragment bottleneck tables and job/fragment CPU/memory/busy-rate panels to identify likely hotspots.
- Use backpressure metrics as secondary corroboration and graph-propagation analysis (not the only default entry point).
- Add a short “when to use what” decision table:
- resource saturation suspected -> CPU/memory/busy-rate + bottleneck tables
- downstream blockage propagation suspected -> backpressure/path tracing
- Ensure Jarvis-facing guidance snippets in docs are aligned with this flow so generated support responses follow the same order.
- Add/update screenshots for the new memory/fragment/job-level panels and bottleneck tables.
Reactions are currently unavailable
Metadata
Metadata
Labels
bugSomething isn't workingSomething isn't workingdocumentationImprovements or additions to documentationImprovements or additions to documentation