Skip to content

Docs: align bottleneck troubleshooting guidance with new job/fragment CPU-memory-busy metrics #1065

@kwannoel

Description

@kwannoel

Context

Internal Slack thread (2026-03-18, #product discussion) requested docs updates after Grafana/dashboard changes:

  • Storage alerts/metrics were refined to reduce noise.
  • Job/fragment-level CPU, memory, and busy-rate metrics are now available.
  • Dashboard tables now show bottleneck percentages by job.
  • Team feedback: Jarvis/docs still often tell users to rely on backpressure first, which may no longer match current troubleshooting expectations.

Related references:

Problem

Current user-facing troubleshooting guidance appears inconsistent with the latest PROD Grafana workflow. Users may still be guided to use backpressure-centric debugging as the primary path, while the dashboard now exposes direct job/fragment-level CPU/memory/busy-rate and bottleneck ranking tables intended for faster bottleneck localization.

This can cause:

  • slower triage due to outdated diagnostic sequence,
  • confusion between dashboard capabilities and docs/Jarvis recommendations,
  • inconsistent support guidance across engineers.

Suggested Fix

Update troubleshooting + metrics docs to reflect a modern, explicit workflow:

  1. Start from job/fragment bottleneck tables and job/fragment CPU/memory/busy-rate panels to identify likely hotspots.
  2. Use backpressure metrics as secondary corroboration and graph-propagation analysis (not the only default entry point).
  3. Add a short “when to use what” decision table:
    • resource saturation suspected -> CPU/memory/busy-rate + bottleneck tables
    • downstream blockage propagation suspected -> backpressure/path tracing
  4. Ensure Jarvis-facing guidance snippets in docs are aligned with this flow so generated support responses follow the same order.
  5. Add/update screenshots for the new memory/fragment/job-level panels and bottleneck tables.

Metadata

Metadata

Labels

bugSomething isn't workingdocumentationImprovements or additions to documentation

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions