Skip to content

Feature: Runbooks for Incident Resolution (AI + SRE) #1

@sojan-official

Description

@sojan-official

Summary

Add the ability to define and store runbooks for common incidents so both AI agents and SREs can reference structured resolution steps during incident response.

Problem

When incidents occur, resolution steps often live in scattered docs, Slack threads, or individual memory.
This leads to:

  • Slower resolution times
  • Inconsistent handling
  • Knowledge silos
  • Limited AI-assisted troubleshooting

We need a structured, queryable way to store and retrieve incident runbooks.

Proposed Solution

Introduce Runbooks as a first-class entity:

  • Create / edit runbooks
  • Tag by service, severity, category
  • Structured steps (checklist format)
  • Attach logs, queries, dashboards, or links
  • Support markdown

Each runbook should include:

  • Title
  • Description
  • Affected services
  • Trigger conditions
  • Step-by-step resolution instructions
  • Escalation notes
  • Post-incident checklist

AI Integration

Runbooks should be:

  • Searchable via semantic search
  • Automatically suggested during incidents
  • Usable by AI agents to execute or recommend steps
  • Context-aware based on error signals

Example:

If error rate spikes on api-service, suggest “High 5xx Errors – API Service” runbook.

Benefits

  • Faster MTTR
  • Consistent resolution
  • Easier onboarding of new SREs
  • Enables AI-assisted incident response
  • Institutional knowledge capture

Future Extensions

  • Link runbooks to specific alert rules
  • Auto-trigger runbooks
  • Execution logs tied to incidents
  • Feedback loop to improve runbooks over time

Would love community feedback on:

  • How you currently manage runbooks
  • What fields are essential
  • Whether AI-assisted execution would be useful

Open to refining the scope before implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions