Skip to content

High Availability (HA) Deployment: Eliminate SPOF via Replicated Deployment (on Docker Swarm & Kubernetes) #175

@jankarres

Description

@jankarres

Problem
Peekaping can become a single point of failure in a single node deployment. If the instance running the monitoring service goes down — for example due to hardware issues, restarts, updates, or network outages — all monitoring functionality is immediately lost. In production environments this is critical, because the monitoring system must remain reliable even if parts of the infrastructure fail.

Proposed solution
Enable a high availability setup with ≥3 replicas where exactly one replica performs monitoring checks. Use a leader-election mechanism for scheduler ownership and automatic failover so a standby replica takes over when the active one fails. Prevent duplicate checks during transitions.

Scope & Non-Goals

  • In scope: multi-replica operation; single active scheduler; leader election; clean failover behavior
  • Out of scope: provisioning/operation of external reverse proxies or load balancers — users handle these

Suggestions for documentation

  • Provide reference deployments for Docker Swarm (simple HA entry point) and Kubernetes (for advanced setups)
  • Include guidance on running multiple replicas, health/readiness checks, and safe configuration of any shared state
  • Suggest Traefik as an example reverse proxy in the docs

Technical notes (informative)

  • Centralize scheduler state (e.g., datastore/coordination backend) to avoid split-brain and ensure idempotent job execution
  • Define reasonable leader heartbeat and takeover intervals; document expected failover timing
  • Clarify rolling-update behavior to avoid unnecessary leadership churn during upgrades

Alternatives
To the best of my knowledge, there is no comparable open-source monitoring tool with built-in HA. Practical alternatives are only cloud-hosted services (e.g. UptimeRobot) for enterprise use cases (or a different kind solution like a observability platform like New Relic).

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions